Tech

6月 292020
 

Companies have recognized that the key to maintaining customer loyalty and increasing engagement is to anticipate customer’s needs and desires. To that end, companies have invested heavily in AI technologies to create recommendation engines that present offers, communications, and products to fulfill those needs. Nowadays, recommendation engines can be found just about everywhere from news websites to e-commerce sites to online streaming services. However, customers frequently encounter recommendations that are inappropriate or repetitive. Just because you accidentally clicked on something 6 months ago, doesn’t mean that you should receive 100 variations of that same article/product today.

The challenge is that companies have often focused too much on building advanced AI algorithms to power their recommendation systems, but frequently miss out on rapid changes in the marketplace. Since recommendation algorithms are based heavily on historical behavior, they may fail under rapidly changing environments where new products are introduced, consumer tastes change rapidly, or market conditions deteriorate.

Recommendation engines also frequently fail to account for real-time events and context. For example, during holidays, people’s tastes can be highly seasonal. Using recommendations based on purchases made during other times of the year may have no relevance to what people want today.

Companies are also under pressure to recommend products and content that are most profitable and high value. Unfortunately, the most profitable products may not be the products most preferred by customers. This can result in conflict between marketers who want to push products and data scientists who want to create good recommendation engines. If the marketers override the recommendations, this can result in consumers losing trust in the recommendations.

To address these limitations, companies need to think about developing an approach to recommendation engines that account for the following factors:

  • Business objectives: Which products are most profitable? How do we optimize the recommendations to generate the highest revenue?
  • Context: What do we know about the customer before we deliver a recommendation? Are they at home/school/work/vacation? Did a significant life event occur such as getting married, having a baby, or buying a house?
  • Historical behavior: Analyzing past transactions to generate recommendations. This involves using AI and machine learning techniques such as collaborative filtering, market basket analysis, factorization machines to look at previous purchase history and compare them to others who purchased similar products.
  • Real-time trends: Using real-time information to address sudden changes in consumer demand. This real-time information can come from social media feeds or by analyzing real-time streaming data.

To address these different aspects of a recommendation engine, I want to walk you through an example of building a recommendation engine that incorporates these different aspects.

Scenario: A cable company would like to develop an app for subscribers that provides real-time recommendations for live TV. To improve the relevancy of the recommendations, they would like to consider several factors. Firstly, they would like to better predict if a family or child is watching at that moment and make age-appropriate recommendations. Secondly, if a show or content is extremely popular right now with other viewers (such as a breaking news event or a sports event in their area), they would like to override the default recommendation with the show that is extremely popular. This can be accomplished in 5 steps:

1. Use factorization machines to analyze historic viewing behavior and come up with personalized recommendations

Factorization machines are one of the most powerful recommendation algorithms currently available. It uses matrix factorization to project ratings on a very sparse matrix of users and products. SAS Viya provides a powerful distributed in-memory engine to train factorization machines on extremely large, sparse datasets consisting of thousands of products (or TV shows) and millions of users.

In the example below, we trained a factorization machine on set-top box viewing data. Our target variable was viewing seconds of the show. The factorization machine attempts to predict how long people will watch a program that they haven’t seen before based on the viewing habits of similar viewers. After training the factorization machine, we can generate a prediction for every program that an individual hasn’t watched before. Using this prediction, we can then rank-order all shows by the predicted viewing time from the factorization machine algorithm.

2. Build predictive models to predict who’s watching

To determine who is watching at any given time, you can use predictive models to best predict whether a child or family is watching. By collecting data on when and where users were historically watching family-friendly content, we can train a model that will predict the likelihood that a family or a child may be watching TV at that time. Using SAS Visual Data Mining and Machine Learning, users can build scalable modeling pipelines that take in historical viewing data, transform data for modeling, and build out a series of candidate models (such as a gradient boosting, neural network, random forest, etc.). After evaluating the modeling performance on a hold-out sample, the champion model can be published in production and leveraged within a decisioning flow.

3. Use SAS Event Stream Processing to capture what is popular right now

To better calculate what’s happening right now, we need a tool that can analyze real-time streaming data and act on it. SAS Event Stream Processing was designed precisely to analyze real-time streaming results before it lands in a data lake or database. Real-time tuning records from set-top boxes, mobile apps, websites, and smart TVs can be aggregated and analyzed in real-time to determine the most popular shows that are playing in real-time for a demographic, region, or genre.

4. Use SAS Intelligent Decisioning to deploy business rules

SAS Intelligent Decisioning is a solution for orchestrating real-time decisions that incorporate business rules and predictive models. It allows non-technical users to design decision flows using an easy GUI interface. After a decision flow is created, it can then be published as a REST API that can be called in real-time from edge devices (such as set-top boxes, mobile apps, Smart TVs, etc.) to receive a real-time recommendation. We can also export these decisions and embed them directly within SAS Event Stream Processing Engine. For more sophisticated users with a strong programming background, these business rules can also be coded directly in SAS Event Stream Processing without using Intelligent Decisioning.

In the example below, we can orchestrate a decision flow that determines what to recommend given certain circumstances. If the predictive model predicts that a child or family is watching, then a family-friendly recommendation will be presented. If event stream processing determines that a certain show is extremely popular right now, then it will override the baseline recommendation with the popular show. Otherwise, it will send the recommendation that was generated by the factorization machine.

5. Orchestrate the entire decisioning process using SAS Event Stream Processing

To bring this all together into a single flow that can work in real-time, we need a tool that can ingest real-time streaming data, enrich the data with all the relevant information we need to make an intelligent recommendation, aggregate real-time data, and execute the decisioning flow. This will result in a final recommendation. Event stream processing can orchestrate all these elements into a single project.

In the example below, SAS Event Stream Processing takes in real-time streaming set-top box records and then enriches it with data from the customer data warehouse. The event stream processing engine then aggregates real-time TV viewing across all devices and determines which shows are most popular right now. Then it scores the data using the predictive model to determine whether a child or family is watching. Finally, it executes the decision flow created in SAS Intelligent Decisioning to determine what the final recommendation will be. Event Stream Processing has a REST API that allows third-party applications or devices to connect to this flow and receive the requested recommendation.

Conclusion

The example above demonstrates how an organization can design sophisticated recommendation engines that incorporate not only AI algorithms, but also business rules, real-time streaming, and predictive models. This allows businesses to provide far superior recommendations than using AI algorithms alone. It allows context, real-time information, and business objectives to be incorporated when making the final recommendation. By leveraging tools like Event Stream Processing and SAS Intelligent Decisioning, business users can design, orchestrate, and operationalize the entire recommendation process. To learn more, check out these additional resources:

How to improve recommendation engines with real-time context and business rules was published on SAS Users.

6月 242020
 

A lookup table is a programming technique where one or more values can be used to retrieve another value. For example, many years ago, I had benzene exposure estimates for 10 years (1940 to 1949) for each of five locations in a factory. Given a year and a job location, I needed to know the benzene concentration.

I would be terribly embarrassed today if anyone saw the first program I wrote to solve the problem! This blog shows a better way that uses temporary arrays to create an n-way lookup table. To keep the example simple, let's use five years of data (1944 to 1948) and four locations (1 to 4).

Temporary arrays

Before we get into the program, let's discuss temporary arrays, one of my favorite SAS tools. Here is an example of a one-dimensional temporary array:

Data Pass_Fail;
   input ID $ Grade1 - Grade5;
   array PF[5] _temporary_ (65 70 55 65 55);
   array Grade[5]; *If you leave off the variable list SAS will use the
                    array name with numbers 1-5 added. In this example
                    the variables will be Grade1, Grade2, etc.;
   array Pass_or_Fail[5] $ 4;
   do i = 1 to 5;
      if Grade[i] ge PF[i] then Pass_or_Fail[i] = 'Pass';
      else if not missing(Grade[i]) then Pass_or_Fail[i] = 'Fail';
   end;
   drop i;
datalines;
001 90 68 52 70 72
002 56 69 72 75 88
;
Title "Listing of Data Set Pass_Fail";
Proc print data=Pass_Fail noobs;
Run;

In this example, the temporary array is called PF (pass fail values), and it has 5 elements. There are no actual variables PF1, PF2, and so on, only array elements PF[1], PF[2], and so on. The initial values of the five passing grades are placed in parentheses following the key word _temporary_. In many situations, you load the values of the temporary array from a data file.

To keep this first example easy to understand, we will put the initial values in the array statement. You can now compare each student's grade for every test and assign a value of "Pass" or "Fail."

Here is the output:

Note: You can read a blog that I wrote years ago on temporary arrays for another example. 

Example

Now for the two-way table lookup example.

*Two-dimensional table lookup using a temporary array;
data Lookup;
   array Benzene[1944:1948,4] _temporary_; ①
 
   /* Populate the array */
   if _n_ = 1 then do Year = 1944 to 1948; ②
       do Location = 1 to 4;
	      input Benzene[Year,Location] @; ③
	   end;
	end;
 
   input Subj $ Year Location;
   Benzene_Level = Benzene[Year, Location]; ④
datalines;
250 200 150 130
90 180 155 90
95 35 170 140 
80 50 45 100 
40 50 25 15
001 1944 3
002 1948 1
003 1945 4
;
title "Listing od Data Set Lookup";
proc print data=Lookup noobs;
run;

① This ARRAY statement creates an array with two dimensions (you use a comma to create multiple dimensions). To make programming easier to understand, the first dimension of the array uses subscripts 1944 to 1948, rather than 1 to 5 (the colon enables you to specify the lower and upper bounds of an array). Also notice that there are no initial values in this statement—they will be read from data.
② This section of code populates the values in the Benzene temporary array. You use the statement if _n_ = 1 to ensure that this section of code executes only once.
③ The INPUT statement reads in a value for Year and Location. The single trailing @ sign prevents SAS from going to a new line each time to DO loop iterates.
④ Notice how easy it is to retrieve an exposure value, given a value of Year and Location. The first five lines of data are the values used to populate the temporary array.

You can read more about temporary arrays in my book, Learning SAS by Example: A Programmers Guide, Second Edition.

Comments on this blog are welcome.

Multi-way lookup tables was published on SAS Users.

6月 232020
 

When you execute code on the SAS® Cloud Analytic Services (CAS) server, one way you can improve performance is by partitioning a table. A partitioned table enables the data to load faster when that data needs to be grouped by the common values of a variable.

This post explains what it means to partition a table and describes the advantages of a partitioned table. I'll illustrate these concepts with example code that shows improved processing time when you use a partitioned table.

SAS BY-group requires SORT

BY-group processing enables you to group your data by unique variable values. This processing is used, for example, in tasks like merging data sets by a common variable and producing reports that contain data that is grouped by the value of a classification variable. You also use BY-group processing when you execute code in CAS. A key difference in creating BY-groups in SAS versus in CAS is that SAS requires a SORT procedure to sort the data by the specified BY variable in order to create the BY groups.

BY-group sorting implicit in CAS

This step is not required in CAS. When you perform BY-group processing on a CAS table, an implicit sorting action takes place, and each BY-group is distributed to the available threads. This implicit sort process takes place each time that the table is accessed and the BY-groups are requested.

Partitioning a CAS table permanently stores the table such that the values of the BY variable are grouped. Using a partitioned CAS table enables you to skip the implicit sort process each time the table is used, which can greatly improve performance.

Partition CAS action example

You create a partitioned CAS table by using the partition CAS action. The following example shows how to partition the CARS table (created from the SASHELP.CARS data set) by the MAKE variable.

caslib _all_ assign;    ①
data casuser.cars;      
set sashelp.cars;  ②
run;
proc cas;
table.partition /     ③                                              
casout={caslib="casuser", name="cars2"} ④                            
table={caslib="casuser", name="cars", groupby={name="make"}}; ⑤
quit;

In this code:

①  The CASLIB statement creates SAS librefs that point to all existing caslibs. CASUSER, which is used in the subsequent DATA step, is one of the librefs that are created by this statement.

②  The DATA step creates the CARS CAS table in the CASUSER caslib.

③  The partition action in the CAS procedure is part of the TABLE action set.

④  The casout= parameter contains the caslib= parameter, which points to the caslib where the partitioned CAS table named CARS2 will be stored.

⑤ The table= parameter contains the name= parameter, which lists the name of the table that is being partitioned. It also contains the CASLIB= option, which points to the caslib in which the table is stored. The groupby= parameter contains the name= option, which names the variable by which to partition the table.

You can confirm that the table has been partitioned by submitting the following CAS procedure with the tabledetails action.

proc cas;
table.tabledetails result=r / level='partition' ①
name='cars2'
caslib='casuser';
run;
describe r; ②
quit;

In this code:
① The LEVEL= parameter specifies the aggregation level of the TABLEDETAILS output.
② The DESCRIBE statement writes the output of the TABLEDETAILS action to the log.

The following output is displayed in the resulting log. The KEY column shows the variable the table has been partitioned by.

As mentioned earlier, the purpose of partitioning a table is to improve performance. The following example uses two CAS tables that consist of an ID variable with 10 possible values and 10 character variables. Each table contains 5,000,000 rows. This example illustrates how much performance improvement you can gain by partitioning the tables. In this case, the ID variable merges two tables.

First, you create the tables by submitting the following DATA steps:

data casuser.one;
array vars(10) $8 x1-x10;
do j=1 to 5000000;
id=put(rand('integer',1,10),8.);
do i=1 to 10;
vars(i)=byte(rand('integer',65,90));
end;
output;
end;
drop i j;
run;
 
data casuser.two;
array vars(10) $8 x1-x10;
do j=1 to 5000000;
id=put(rand('integer',1,10),8.);
do i=1 to 10;
vars(i)=byte(rand('integer',65,90));
end;
output;
end;
drop i j;
run;

The DATA steps above show how to merge non-partitioned tables. In the log output shown below, you can see that the total, real time (highlighted) took almost 45 seconds to run.

Partitioned tables example

The next example runs the same DATA step code, but it uses partitioned tables. The first step is to partition the tables, as shown below:

proc cas;
table.partition /                                                   
casout={caslib="casuser", name="onepart"}                             
table={caslib="casuser", name="one", groupby={name="id"}};   
run; 
 
table.partition /                                                   
casout={caslib="casuser", name="twopart"}                             
table={caslib="casuser", name="two", groupby={name="id"}};   
quit;

To merge the two tables and to product the log, submit the following code. (The real time is highlighted in the log.)

data casuser.nopart;
merge casuser.onepart casuser.twopart;
by id;
run;

The output for this code is show below:

This time, the DATA Step took only 25.43 seconds to execute. That is a 43% improvement in execution time!

Partition for improved performance

If your analysis requires you to use the same table multiple times to perform BY-group processing, then I strongly recommend that you partition the table. As the last example shows, partitioning your table can greatly improve performance!

Partition your CAS tables to greatly improve performance was published on SAS Users.

6月 192020
 

In the second post of the Getting Started with Python Integration to SAS® Viya® series we will learn about Working with CAS Actions and CASResults Objects. CAS actions are commands sent to the CAS server to run a task, and CASResults objects contain information returned from the CAS server. This post will cover a few basic CAS actions, and how to easily work with the information returned.

CAS Actions Overview

First, you need to understand CAS actions. From performing data preparation, modeling, imputing missing values, or even retrieving information about your CAS session, CAS actions perform a single task on the CAS server. CAS actions are organized with other CAS actions in a CAS action set. CAS action sets contain actions that are based on common functionality.

In the end, I like to think of CAS action sets as a package, and all the CAS actions inside an action set as a method.

Getting Started with CAS Actions

We will start with a basic CAS action to view all available loaded action sets in the current CAS session. Before you use any CAS action you must connect to the CAS server. For more information on connecting to the CAS server, visit Part 1 of the series.

I have already made my connection to CAS using SAS Viya for Learners, and my connection object is named conn.

display(conn)
CAS('svflhost.demo.sas.com', 5570, 'peter.test@sas.com', protocol='cas', name='py-session-1', session='efff4323-a862-bd6e-beea737b4249')

Next, to view all available action sets loaded in your CAS session use the actionSetInfo CAS action from the Builtins action set on the CAS connection object. CAS actions and CAS action sets are case insensitive, and you do not need to qualify the CAS action with the action set name (although it is best practice to include both). In the example below I qualify the CAS action.

The actionSetInfo CAS action returns the following output:

Your output information might differ depending on your SAS Viya installation.

The output shows a list of the loaded action sets on the CAS server with additional information for each. With a quick scan, it almost looks like a Pandas DataFrame. However, let's go back and run the actionSetInfo CAS action, but this time use the Python type function to see the data type of the output.

type(conn.builtins.actionsetinfo())
swat.cas.results.CASResults

The actionSetInfo CAS action returns a CASResults object. The million dollar question is, what exactly is a CASResults object?

CASResults Object

CAS actions return a CASResults object. A CASResults object is an ordered Python dictionary with additional methods and attributes. There are no specific rules about how many keys are contained in the CASResults object, or what values the keys return. That information depends on the CAS action used.

One thing to note is that CASResults objects are local on your client machine. That is, the data is not in CAS anymore, it has been processed by CAS and returned locally. This is something to keep in mind when you are working with big data in CAS and try to return more data than your local computer can handle.

Let's continue by examining the output from our previous example. The CASResults object returns one key named setinfo, and one value. You can quickly tell by looking at the output.

However, another method to view all the keys in the CAResults object is to use the Python dictionary keys method. In this example, it will return one key named setinfo.

conn.actionsetinfo().keys()
odict_keys(['setinfo'])

Be aware of how I specified the CAS action in the above example. This time I did not qualify it with the Builtins CAS action set. While you can specify with or without the CAS action set, I recommend to be consistent. I typically do not type the CAS action set. I'll keep that method moving forward.

To view a specific value of a CASResults object you can call the key like a typical Python dictionary. In this example you can call the setinfo key to return the key's value.

conn.actionsetinfo()['setinfo']

As mentioned earlier, the output seems to resemble a Pandas DataFrame. To confirm our suspicion, we can use the type function to see the data type of the value returned by the setinfo key,

type(conn.actionsetinfo()['setinfo'])
swat.dataframe.SASDataFrame

Interesting! The CASResults object contains a SASDataFrame for the setinfo key. What exactly is a SASDataFrame?

Understanding a SASDataFrame

A SASDataFrame is a subclass of a Pandas DataFrame. As a result, you can work with them as you normally would a Pandas DataFrame! Another thing to note is a SASDataFrame is local data. Again, as you use CAS you must be aware CAS can handle more data than your local computer can handle. When bringing data locally make sure it's usable on your local machine. We will discuss this in future posts.

Let's work with the SASDataFrame. First, I will create a new variable named df that holds the SASDataFrame from the actionSetInfo action.

df = conn.actionsetinfo()['setinfo']

Next, I'll use some Pandas methods on the df object. Let's start with the head method to view the first five rows.

df.head()

Or filter the data using the loc method. In this scenario I want to find all action sets with the name simple, and return the actionset and label columns.

df.loc[df['actionset'] == 'simple', ['actionset', 'label']]

Or check unique values of the product_name column by using the value_counts method.

df['product_name'].value_counts()
tkcas        8
crsstat      3
crssearch    1

Or...

No that's it. I think you get my point! You can work with a SASDataFrame object like you would a Pandas DataFrame!

CASResults Object With Multiple Keys

Lastly, let's use another CAS action, but this time the CASResults object contains multiple keys, with multiple value types.

In this example I use the serverStatus CAS action from the Builtins CAS action set. The CAS action returns the status of the server.

conn.serverstatus()

From the looks of the output, I see there are three keys; About, server, and nodestatus. However, let's check our assumption by using the keys method on the CASResults object.

conn.serverstatus().keys()
odict_keys(['About', 'server', 'nodestatus'])

In the output there are three keys as expected. Let's look at the data type of each key by writing a quick Python loop to print the name of the key and the value type returned by that key.

for key,value in conn.serverstatus().items():
      print('Key : {}, Value Type : {}'.format(key,type(value)))
Key : About, Value Type : <class 'dict'>
Key : server, Value Type : <class 'swat.dataframe.SASDataFrame'>
Key : nodestatus, Value Type : <class 'swat.dataframe.SASDataFrame'>

The serverStatus CAS action contains three keys, one key contains a dictionary object, and the other two keys contain a SASDataFrame.

Summary

In conclusion, understanding CAS actions and the CASResults objects are essential when working with CAS. A couple of key points to remember:

  • CAS actions reside in CAS action sets and perform a single task on the CAS server.
  • CAS actions return a CASResults object, which is simply a Python dictionary.
  • A CASResults object contains a single or multiple keys, corresponding to a value or values of any data type.
  • A SASDataFrame resides in a CASResults object, and is a subclass of a Pandas DataFrame.

That was a quick overview of a few basic CAS actions, and how to work with the CASResults objects they return.  You can check out the SAS documentation below for all available CAS Actions.

Additional Resources

SAS® Viya® 3.5 Actions and Action Sets by Name and Product

Getting Started with Python Integration to SAS® Viya® Series Index

Getting Started with Python Integration to SAS® Viya® - Part 2 - Working with CAS Actions and CASResults Objects was published on SAS Users.

6月 192020
 

Index of articles on Getting Started with Python Integration to SAS® Viya®.
Part 1 - Making a Connection
Part 2 - Working with CAS Actions and CASResults Objects
Part 3 - Loading a CAS Action Set
Part 4 - Exploring Caslibs
Part 5 - Uploading Data into CAS
Part 6 - Session vs Global Scope Tables
Part 7 - Exploring Tables? CAS Actions vs SWAT API?

Getting Started with Python Integration to SAS® Viya® - Index was published on SAS Users.

6月 172020
 

Photo by AbsolutVision on Unsplash

As a SAS user for most of my professional career and SAS employee for eight years, I thought I had a pretty good idea of what SAS offers and how SAS® Analytics can solve problems. Yet even I can experience an "aha" moment when I learn something new about SAS and I wish I had known before.

My most recent "aha" moment came a few months ago, right as the Covid-19 pandemic started to unfold. I learned about the SASEFRED interface engine, a component of SAS/ETS® software that allows you to retrieve a wide range of economic and financial data from the Federal Reserve Economic Data (FRED) site. Hosted by the Federal Reserve Bank of St. Louis, FRED is a treasure trove of about 765,000 US and international times series data reported at the national, state and county levels.

Being able to retrieve data from FRED directly into SAS has been extremely handy as I've developed interactive dashboards to capture key economic and financial indicators, such as unemployment claims, confidence sentiment, stock market index, volatility, etc. My SAS US Public Sector team's focus on this is part of a bigger effort to come alongside customers since the pandemic began. Our goal is to help all levels of government quickly set up analytical environments and provide timely situational awareness and analytical services. We're glad we're positioned to help public officials address public health and economic consequences of the pandemic.

How to get FRED data

SAS FRED documentation is self-explanatory. The first step is to obtain a unique FRED API key on the FRED site: https://api.stlouisfed.org/api_key.html. Once that’s done, you are off to the race.

Below is the snippet of the code that I used to pull unemployment data into SAS with SASEFRED.

options validvarname=any
   sslcalistloc="/opt/sas/viya/config/etc/SASSecurityCertificateFramework/cacerts/trustedcerts.pem";
 
libname _all_ clear;
libname mylib "/opt/sas/viya/config/data/cas/default/public"; /** Folder for final datasets **/
libname fred "/opt/sas/viya/config/data/cas/default/public/fred"; /** Folder for intermediate datasets **/
 
/** Ingest FRED Data **/
 
libname fred sasefred "/opt/sas/viya/config/data/cas/default/public/fred"
   OUTXML=UnemploymentClaims
   AUTOMAP=replace
   APIKEY='XXXXXXXXXXXXXXXXXXXX'  /** please request your API at this site https://api.stlouisfed.org/api_key.html **/
   IDLIST='ICSA,ICNSA,IC4WSA,CCSA,CCNSA'
   START='2008-01-01'
   END='2020-06-30'
   freq='w'
   ;
data mylib.UnemploymentClaims;
   set fred.UnemploymentClaims;
run;
proc print data=mylib.UnemploymentClaims;
run;

The output of the above code execution is the table ‘UnemploymentClaims’ with information on weekly reported initial unemployment claims, four-week moving average claims, and continued claims (see Figure 1 below).

Figure 1

An interactive dashboard with FRED data

The work was done on SAS® Viya, the next generation of SAS Analytics. SAS Viya offers a wide range of robust analytical capabilities, including visual data exploration and reporting. With data ingested into SAS Viya, I am now able to quickly develop an interactive economic dashboard with relevant indicators that will automatically update as the new economic data is reported – all thanks to SASEFRED (see image below).

Check the resources below to learn more. Hope you found this post relevant and useful. Feel free to reach out if you have any questions!

SGF PAPER | Using SAS® Forecast Server and the SASEFRED Engine to Enhance Your Forecast YOUTUBE | Extracting a Common Time Trend in SAS/ETS

How to access Federal Reserve Economic Data (FRED) with SASEFRED in SAS/ETS® software was published on SAS Users.

6月 112020
 

Whether you enjoy debugging or hate it, for programmers, debugging is a fact of life. It’s easy to misspell a keyword, scramble your array subscripts, or (heaven forbid!) forget a semicolon. That’s why we include a chapter on debugging in The Little SAS® Book and its companion book, Exercises and Projects for the Little SAS® Book. We believe that learning to debug makes you a better programmer. Once you understand a bug, you will be better prepared to avoid it in the future.

To help hone your debugging skills, here is an example of the type of problems you can find in our book of exercises and projects. See if you can find the bugs.

Programming exercise

  1. A friend tells you that she is learning SAS and wrote the following program. Unfortunately, the program won’t run. Help her improve her programming skills by finding the mistakes.

TITLE Height, Weight, and BMI;
TITLE2 by Sex and Age Group;
PROC CONTENT DATA = SASHELP.class; RUN;
DATA; SET SASHELP.class;
Height_m = Heigth * 0.0254;
Weight_kg = Weight * 0.4536;
BMI = Weight_kg / Height_m**2;
PROC FORMAT; VALUE
$sex 'M' = 'Boys' 'F' = 'Girls';
VALUE agegp 11-12 = 'Preteens
13-16 = 'Teens';
PROC TABULATE;
CLASS Sex Age; VAR Height_m Weight_kg;
TABLES (Height_m Weight_kg BMI)*
MEAN, Sex Age ALL;
FORMAT Sex $sex. Age agegp.;
RUN;
QUIT;

 

  • a. Examine the SAS data set SASHELP.CLASS including variable attributes.
  • b. Clean up the formatting of the program by adding appropriate indention and line spacing to show the structure of the DATA and PROC steps. Make changes as needed to make the program conform to standard best practices.
  • c. Fix any errors in the code so that the program will run correctly.
  • d. Add comments to the revised program for each bug that you fix so that your friend can understand her mistakes.

Solution

In the book, we provide solutions for odd-numbered multiple choice and short answer questions, and hints for the programming exercises. So here is a hint for this exercise:

  1. Hint: This program contains four bugs. It also contains “red herrings” that are unusual for SAS code, but nonetheless do run properly and so are not actual bugs. Be sure you know how SAS handles data set names by default. SAS Enterprise Guide can format code for you; right-click the Program window and select Format Code from the pop-up menu. To format code in SAS Studio, click the Format Code icon at the top of Program window.

For more about The Little SAS Book and its companion book of exercises and projects, check out these blogs:

What's wrong with this code? was published on SAS Users.

6月 042020
 

Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

Hunter’s story

Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

What you’ll find here and what you won’t

This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

A LITTLE about Jupyter notebooks

A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

Install Anaconda

The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

Install additional language kernels

In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

Available kernels list

File conversion methods

Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

Download (Export) option

Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

"Download As"  option in Jupyter notebook

"Export Notebook As" option in JupyterLab

HTML format output

Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

SAS code in a Jupyther notebook

Corresponding SAS code notebook in html form

SAS (aka script) format output

Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

SAS code saved from a notebook displayed in Enterprise Guide

One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

PDF format output

There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

Once I’ve completed creating the files, the new files appear in my File Explorer.

New SAS file in Windows File Explorer

Cheaters may never win, but they can create a PDF quickly

Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

Inline code option

Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

import os
os.system("jupyter nbconvert myNotebook.ipynb --to html")

An example with Python

The example below is from a Python notebook. The "0" out code represents success.

Code to create a PDF file from a Python notebook

An example with SAS

As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

Code to create a PDF file from a SAS notebook

The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

Jupyter file system display in a web browser

An example with R

Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

Code to create an HTML file from an R notebook

The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

The command line

The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

An example with a SAS notebook

Convert sasNotebook.ipynb to a SAS file.

>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
 
>> jupyter nbconvert --to script sasNotebook.ipynb
[NbConvertApp] Converting notebook sasNotebook.ipynb to script
[NbConvertApp] Writing 351 bytes to sasNotebook.sas
 
>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
-rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

An example with a Python notebook

Convert 1_load_data.ipynb to a PDF file

>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
 
>> jupyter nbconvert 1_load_data.ipynb --to pdf
[NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
[NbConvertApp] Writing 27341 bytes to .\notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
[NbConvertApp] WARNING | b had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
 
>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
-rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

An example with an R notebook

Convert HR_R.ipynb to an R file.

>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
 
>> jupyter nbconvert HR_R.ipynb --to script
[NbConvertApp] Converting notebook HR_R.ipynb to script
[NbConvertApp] Writing 981 bytes to HR_R.r
 
>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
-rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

Wrapping things up

Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

How to save Jupyter notebooks in assorted formats was published on SAS Users.

6月 042020
 

Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

Hunter’s story

Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

What you’ll find here and what you won’t

This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

A LITTLE about Jupyter notebooks

A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

Install Anaconda

The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

Install additional language kernels

In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

Available kernels list

File conversion methods

Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

Download (Export) option

Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

"Download As"  option in Jupyter notebook

"Export Notebook As" option in JupyterLab

HTML format output

Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

SAS code in a Jupyther notebook

Corresponding SAS code notebook in html form

SAS (aka script) format output

Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

SAS code saved from a notebook displayed in Enterprise Guide

One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

PDF format output

There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

Once I’ve completed creating the files, the new files appear in my File Explorer.

New SAS file in Windows File Explorer

Cheaters may never win, but they can create a PDF quickly

Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

Inline code option

Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

import os
os.system("jupyter nbconvert myNotebook.ipynb --to html")

An example with Python

The example below is from a Python notebook. The "0" out code represents success.

Code to create a PDF file from a Python notebook

An example with SAS

As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

Code to create a PDF file from a SAS notebook

The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

Jupyter file system display in a web browser

An example with R

Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

Code to create an HTML file from an R notebook

The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

The command line

The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

An example with a SAS notebook

Convert sasNotebook.ipynb to a SAS file.

>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
 
>> jupyter nbconvert --to script sasNotebook.ipynb
[NbConvertApp] Converting notebook sasNotebook.ipynb to script
[NbConvertApp] Writing 351 bytes to sasNotebook.sas
 
>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
-rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

An example with a Python notebook

Convert 1_load_data.ipynb to a PDF file

>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
 
>> jupyter nbconvert 1_load_data.ipynb --to pdf
[NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
[NbConvertApp] Writing 27341 bytes to .\notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
[NbConvertApp] WARNING | b had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
 
>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
-rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

An example with an R notebook

Convert HR_R.ipynb to an R file.

>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
 
>> jupyter nbconvert HR_R.ipynb --to script
[NbConvertApp] Converting notebook HR_R.ipynb to script
[NbConvertApp] Writing 981 bytes to HR_R.r
 
>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
-rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

Wrapping things up

Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

How to save Jupyter notebooks in assorted formats was published on SAS Users.

6月 032020
 

In natural language processing, word vectors play a key role in making technologies such as machine translation and speech recognition possible. A word vector is a row of numeric values where each point captures a dimension of the word’s meaning. Each value represents how closely it relates to the concept behind that dimension, so the semantics of the word is embedded across the dimensions of the vector. Since similar words have similar vectors, representing words as vectors like this would simplify and unify vectors' operations.

Word vectors are generated by a training performed word-word co-occurrence statistics on a large corpus. You can use pre-trained word vectors like GloVe, provided by Stanford University.

Let's talk about how to transform word vector tables from long to wide in SAS, so we can potentially get sentence vectors to process further. Suppose we generate word vectors from the following 3 sentences:

Jack went outside.
Jill likes to draw in the afternoon.
Tony is a boy.

Each word has 2 numeric values (Vector1, Vector2), each value represents how closely the word relates to the concept defined by that dimension. The value numbers (here VNUM=2) may range from hundreds to thousands in real text analysis scenarios.

Long word vector table

The sample code below generates an upper sample table and sorts it for further processing.

data HAVE;
  length Word $ 45;
  input SentenceID Word Vector1-Vector2; /*300+*/
datalines;
1	Jacky 	   0.24011	 0.400996 
1	went	  -0.047581	 0.868716 
1	outside	  -1.197891	 1.162238
2	Jill	  -0.199579	 0.251252
2	likes	  -1.935640	-0.288264
2	to	  -0.526053	-1.143420
2	draw	  -0.736289	-0.794812
2	in 	  -2.757234	 0.506639
2	the	  -0.736289	-0.794812
2	afternoon -0.047581	 0.868716
3	Tony 	   0.34032	 0.600983 
3	is	   0.147531	 0.968817
3	a	   1.347543	 2.568323
3	boy       -3.257891      3.172238
run; 
proc sort data=HAVE;
  by SentenceID;
run;
proc print data=have;run;

If we want to transform the upper long table to a wide table as seen below, how can we do this as efficiently and simply as possible? The upper 14 words belong to 3 sentences that would result in the following 3 rows with 22 columns (1 + WNUM + WNUM x VNUM=1 + 7 + 7 x 2 = 22).

Wide word vector table

Please note that we can calculate the max word number (WNUM) in a sentence at runtime with SAS code below. For the upper case, the value of WNUM is 7.

proc sql noprint;
  select max(count) into :wnum from (
    select count(Word) as count from HAVE group by SentenceID 
  );
quit;

In fact, we don’t need any SAS PROC to handle this kind of transformation. A SAS Data step provides an efficient and convenient way to transform data. The key is to use an ARRAY to map all word vectors from the source table, and then define two ARRAYs to store output words and vectors in a wide style. These two arrays for output words and vectors need to be RETAIN during the implicit loop and KEEP for OUTPUT while it reaches the last SentenceId.

You can see the full SAS code below with detailed comments.

/*Long table to Wide table*/
%let vnum=2; /*vector numbers for a word*/
%let wnum=7; /*max word number in a sentence*/
data WANT;
  set HAVE;
  by Sentenceid;
  array _vector_ [*] vector:;         /*Map to source vectors*/
 
  array _word [ %eval(1*&wnum)] $ 45; /*Array to store WORD in wide table*/
  array _vector [ %eval(&wnum*&vnum)];/*Array to store VECTORS in wide table*/
  retain _word: _vector:;             /*RETAIN during the implicit loop*/
 
  retain _offset_ 0;                  /*Offset of a WORD in a sentence, base 0*/
  if first.Sentenceid then do;
    call missing(of _word[*]);
	call missing(of _vector[*]);
    _offset_=0;
  end;
  else _offset_=_offset_+1;
 
  _word[ _offset_+1 ]=word;           /*Cache current word to array WORD at [ _offset_+1]*/
  do i=1 to dim(_vector_);            /*Cache each vectors to array VECTORS at [_offset_* &vnum +i]*/
    _vector[_offset_* &vnum +i]=_vector_[i]; 
  end;
  keep Sentenceid _word: _vector: ;   /*Keep for output when it hit last.Sentenceid*/
 
  if last.Sentenceid then output;     /*Output the cached WORD and VECTORS*/
run;
 
proc print data=want;run;

Accordingly, if we need to transform a word vector back from wide style to long style, we need to generate &WNUM rows x &VNUM columns for each sentence, and it’s the reversed process for upper logic. The full SAS code with detailed comments is listed below:

/*Wide table to Long table*/
data HAVE2;
  set WANT; 
 
  array _word [*] _word:;           /*Array _word[] mapping to WORD in wide table*/
  array _vector_ [*] _vector:;     /*Array _vector[] mapping to VECTORS in wide table*/
 
  length Word $ 45;                 /*Output Word in the long table*/
  array Vector[&vnum];              /*Output Vectors in the long table*/
  do i=1 to &wnum;                  /*Unpack word from array _word[]*/       
    word=_word[i]; 
	if word=" " then continue;
    do j=1 to &vnum;                /*Unpack vectors from array _vector[]*/
	  oo= (j+&vnum * (i-1)); 
      Vector[j]=_vector_[j + &vnum *(i-1)];
    end;
	keep Sentenceid Word Vector:;
	output;                          /*One row in wide table generate &wnum rows[]*/
  end;
run;
proc print data=HAVE2;run;

To wrap the upper bi-directional transformation process for general repurposing in text analysis, we provide two SAS MACROs listed below:

%Long2Wide(data=Have, vnum=2, wnum=7, sid=SentenceId, word=Word, out=Want);
proc print data=Want;run;
 
%Wide2Long(data=Want, vnum=2, wnum=7, sid=Sentenceid, out=Have2, outword=Word, outvector=Vector);
proc print data=Have2;run;

We have demonstrated how to transform a word vector table from a long style to a wide style (or vice versa) efficiently with a SAS DATA step. We have also provided two well-wrapped SAS MACROs for general re-use purposes. To learn more, please check out these additional resources:

Transform word vector tables from long to wide was published on SAS Users.