Tech

10月 312018
 

An important step of every analytics project is exploring and preprocessing the data.  This transforms the raw data to make it useful and quality.  It might be necessary, for example, to reduce the size of the data or to eliminate some columns. All these actions accelerate the analytical project that comes right after.  But equally important is how you "productionize" your data science project.  In other words, how you deploy your model so that the business processes can make use of it.

SAS Viya can help with that.  Several SAS Viya applications have been engineered to directly add models to a model repository including SAS® Visual Data Mining and Machine Learning, SAS® Visual Text Analytics, and SAS® Studio. While the recent post on publishing and running models in Hadoop on SAS Viya outlined how to build models, this post will focus on the process to deploy your models with SAS Model Manager to Hadoop.

SAS Visual Data Mining and Machine Learning on SAS Viya contains a pipeline interface to assist data scientists in finding the most accurate model.  In that pipeline interface, you can do several tasks such as import score code, score your data, download score API code or download SAS/BASE scoring code.  Or you may decide – once you have a version ready - to store the model out of the development environment by registering your analytical model in a model repository.

Registered models will show up in SAS Model Manager and are copied to the model repository.   That repository provides long-term storage and includes version control.  It's a powerful tool for managing and governing your analytical models.  A registered version of your model will never get lost, even it's deleted from your development environment.   SAS models are not the only kind of models that SAS Model Manager can handle:  Python, R, Matlab models can also be imported.

SAS Model Manager can read, write, and manage the model repository and provide actions for model editing, comparing, testing, publishing, validating, monitoring, lineage, and history of the models.  It also allows you to easily demonstrate your compliance with regulations and policies. You can organize models into different projects.   Within a project it's feasible to test, deploy and monitor the performance of the registered models.

Deploying your models

Deploying, a key step for any data scientist and model manager, can assist in bringing the models into production processes. Kick off deployment by publishing your models.  SAS Model Manager can publish models to systems being used for batch processing or publish to applications where real-time execution of the models is required.   Let's have a look at how to publish the analytical model to a Hadoop cluster and run the model into the Hadoop cluster.  In doing so, you can score the data where it resides and avoid any data movement.

  1. Create the Hadoop public destination.

The easiest way to do this is via the Visual Interface.  Go to SAS Environment Manager and click on the Publish destinations icon:

Click on the new destination icon:

Important:

10月 292018
 

CASL is a language specification that can be used by the SAS client to interact with and provide easy access to Cloud Analytic Services (CAS).  CASL is a statement-based scripting language with many uses and strengths including:

  • Specifying CAS actions to submit requests to the CAS server to perform work and return results.
  • Evaluating and manipulating the results returned by an action.
  • Creating user-defined actions and functions and creating the arguments to an action.
  • Developing analytic pipelines.

CASL uses PROC CAS which enables you to program and execute CAS actions from the SAS client and use the results to prepare the parameters for a subsequent action.  A single PROC CAS statement can contain several CASL programs.  With the CAS procedure you can run any CAS action supported by the server, load new action sets into the server, use multiple sessions to perform asynchronous execution and operate on parameters and results as variables using the function expression parser.

CASL, and the CAS actions, provide the most control, flexibility and options when interacting with CAS.  One can use DATA Step, CAS-enabled PROCS and CASL for optimal flexibility and control.  CASL works well with traditional SAS interfaces and the Base SAS language.

Each CAS action belongs to an action set.  Each action set is further categorized by product (i.e. VA, VS, VDMML, etc.).  In addition to the many CAS actions supplied by SAS, as of SAS® Viya™ 3.4, you can create your own actions using CASL.  Developing and utilizing your own CAS actions allows you to further customize your code and increase your ability to work with CAS in a manner that best suits you and your organization.

About user-defined action sets

Developing a CASL program that is stored on the CAS server for processing is defined as a user-defined action set.  Since the action set is stored on the CAS server, the CASL statements can be written once and executed by many users. This can reduce the need to exchange files between users that store common code.  Note that you cannot add, remove, or modify a single user-defined action. You must redefine the entire action set.

Before creating any user-defined actions, test your routines and functions first to ensure they execute successfully in CAS when submitted from the programming client.  To create user-defined actions, use the defineActionSet action in the builtins action set and add your code.  You also need to modify your code to use CASL functions such as SEND_RESPONSE, so the resulting objects on the server are returned to the client.

Developing new actions by combining SAS-provided CAS actions

One method for creating user-defined CAS actions is to combine one or more SAS provided CAS actions into a user-defined CAS action.  This allows you to execute just one PROC CAS statement and call all user-defined CAS actions.  This is beneficial if you repeatedly run many of the same actions against a CAS table.  An example of this is shown below. If you would like copy of the actual code, feel free to leave a reply below.

In this example, four user-defined CAS actions named listTableInfo, simplefreq, detailfreq, and corr have been created by using the corresponding SAS-provided CAS actions tableInfo, freq, freqTab, and correlation.  These four actions return information about a CAS table, simple frequency information, detailed frequency and tabulate information, and Pearson correlation coefficients respectively.  These four actions are now part of the newly created user-defined action set myActionSet.  When this code is executed, the log will display a note that the new action set has been added.

Once the new action set and actions have been created, you can call all four or any combination of them via a PROC CAS statement.  Specify the user-defined action set, user-defined action(s), and parameters for each.

Developing new actions by writing your own code

Another way to create user-defined CAS actions is to apply user-defined code, functions, and statements instead of SAS-provided CAS actions.

In this example, two user-defined CAS actions have been created, bdayPct and sos.  These actions belong to the new user-defined action set myFunctionSet.

To call one or both actions, specify the user-defined action set, user-defined action(s), and parameters for each.

The results for each action are shown in the log.

Save and load custom actions across CAS sessions

User-defined action sets only exist in the current CAS session.  If the current CAS session is terminated, the program to create the user-defined action set must be executed again unless an in-memory table is created from the action set and the in-memory table is subsequently persisted to a SASHDAT file.  Note: SASHDAT files can only be saved to path-based caslibs such as Path, DNFS, HDFS, etc.  To create an in-memory table and persist it to a SASHDAT file, use the actionSetToTable and save CAS actions.

To use the user-defined action set, it needs to be restored from the saved SASHDAT file.  This is done with the actionSetFromTable action.

More about CASL programming and CAS actions

Check out these resources for further information on programming in the CASL language and running actions with CASL.

How to use CASL to develop and work with user-defined CAS actions was published on SAS Users.

10月 222018
 

This blog post was also written by Reece Clifford.

Who’s responsible for x, y, z sales territory? What’s the most amount of people they engaged with in a month? What type of location leads to the best response from the meeting?

To get the complete answer to these sales team-related questions, you need to trust your data. You need to be able to cut and slice high-quality data to prepare for analytics to drive innovation in your company. With SAS Data Preparation alongside SAS Decision Manager, you can do all this. Its many features allow you to perform out-of-the-box column and row transformations to increase your data quality and build the foundations for data-driven innovation.

This blog will discuss how you can leverage SAS Decision Manager to enrich data when preparing it through SAS Data Preparation.

The use case

As posed above, we want to create a SAS Data Preparation plan to map a sales person to a postcode area. We use a SAS Decision Manager rule to find the sales person for a postcode area and map the person to the address. To trigger the rule, we are going to call it from SAS Data Preparation.

In SAS Decision Manager we import a csv file to create a Lookup Table mapping a sales person to a postcode area. Lookup Tables are tables of key-value pairs and provide the ability to access reference data from business rules.

Next, we create a rule to map a postcode and sales person. A rule specifies conditions to be evaluated and actions to be taken if those conditions are satisfied. Rules are grouped together into rule sets. Most rules correspond to the form:

if condition_expressions then action_expressions

For our rule, we are going to have an incoming postcode plus a record id. The postcode is assumed to be a UK postcode. We are extracting the first two characters of the postcode and lookup the sales person from the Lookup Table that we have just imported.

The rule outputs the sales person (representative) and the record ID. When we have tested and published the rule, it's ready to be used in a SAS Data Preparation Plan.

In SAS Data Preparation, we load a table with address data that we want to enrich by the appropriate sales person.

  1. We need to make sure the table column names and rule input parameter names match. Therefore, we are renaming the field ADDRESS_ID to ID, as ID is the rule input name. The second rule input parameter is Postcode which is the same as in the table, therefore no action is needed.

  1. We can then call the previously-created rule in SAS Decision Manager to map a sales person to an area. This will be done by adding some CASL code to the Code node in the SAS Data Preparation plan. This is featured below with a brief explanation of the functions.
    As the rule has two output parameters, we receive only two columns when executing the code step.

CASL Code

loadactionset "ds2";
action runModel submit / 
	modelTable={name="MONITORRULES", caslib="DCMRULES"}
	modelName="Mon_Person"
	table= {name= _dp_inputTable, caslib=_dp_inputCaslib}
	casout= {name= _dp_outputTable, caslib=_dp_outputCaslib};

Parameters settings for CASL call

modelTable name Name of the table where the rule set was published to.
modelTable caslib Name of the caslib where the rule set was published to.
modelName Name of the decision flow to execute.
table name Table name of the decision flow input data.

(Set to _dp_inputTable)

table caslib caslib name of the table input data.

(Set to _dp_inputCaslib)

casout name Table name of the decision flow output data.

(Set to _dp_outputTable)

casout caslib caslib name of the table output data.

(Set to _dp_outputCaslib)

 

Decision Manager Publishing Dialogue

 

  1. We then wanted to bring back the columns from the input table. We do this through joining the table in the SAS Data Preparation Plan to the original table (again) on the rule output field ID and the tables field ADDRESS_ID.

Conclusion

We have answered our initial question of which sales person is mapped to which region by enriching our data in a user-friendly, efficient process in SAS Data Preparation. We can now begin to gain further insight from our data to answer more of the questions posed at the beginning of the blog to help drive innovation. This can be done through additional insight using SAS Decision Manager or functions in SAS Data Preparation in the current plan or use the output table in another plan. Ultimately, this will facilitate data-driven Innovation via reporting or advanced analytics in your organisation.

Using SAS Decision Manager to enrich the data prep process was published on SAS Users.

10月 192018
 

Did you know that you can now chat with SAS Technical Support? Technical Chat enables you to quickly engage with a knowledgeable consultant when you have a SAS question or need help with troubleshooting an issue.

Technical Chat is a great tool for quick questions like these:

  • “What does this error mean?”
  • “How do I apply my new license?”
  • “What release of the operating system is supported on SAS® 9.4?”
  • “How do I activate JMP® without an internet connection?”
  • “Which function can I use to obtain the antilog of a value?”
  • “What is the status of my track?”

How to start a Technical Chat

To get started, click the orange Technical Chat button on select Technical Support web pages. Technical Chat is currently available in the United States and Canada, Monday–Friday from 9 a.m. to 6 p.m. Eastern Time. If the button is not available, technical assistance through other channels is listed on this SAS Technical Support page.  (And don't forget about SAS Support Communities -- peer-to-peer support that's available 24/7.)

Having trouble with a DATA step program or an ODS statement? Specialists in the areas of SAS programming, SAS Studio, and graphics might be available during select afternoon hours. When these specialists are available, you can request their assistance as soon as you click the Technical Chat button. When prompted with What is the nature of your inquiry?, select Usage of Base SAS, SAS/Studio or graphics. For all other questions, select All other products and/or tasks. Generalists are available throughout the day to answer questions.

Ensure that you have the following information available:

  • Site number
  • Operating system
  • Release of SAS, including the maintenance level

When the chat begins, the chat agent’s name appears at the top of the window, as in the following example:

Although every effort is made to resolve your question during the chat, sometimes the chat agent needs to open a track with a subject matter expert. Your question will still be addressed with the same urgency and professionalism that you are accustomed to when working with SAS Technical Support!

Your feedback counts!

When the chat is complete, you can request an emailed copy of the chat transcript. You can also rate your chat experience and provide feedback. Your responses are important to us as we continue to evaluate and improve our Technical Chat services.

Try it out. . . . Chat with SAS Technical Support!

Solve your SAS questions with Technical Chat was published on SAS Users.

10月 152018
 


Old and new SAS users alike learned the tricks of the data trade from our Little SAS Book! We hope these fun tips from our exercise and project book teach you even more about how to master the data analytics game!

From Rebecca Ottesen:

Tip #1: Grouping Quantitative Variables
My favorite tip to share with students and SAS users is how to use PROC FORMAT to group quantitative variables into categories. A format can be created with a VALUE statement that specifies the ranges relevant to the category groupings. Then, this format can be applied with a FORMAT statement during an analysis to group the variable accordingly (don't forget the CLASS statement when applicable). You can also create categorical variables in the DATA step by applying the format in an assignment statement with a PUT function.

From Lora Delwiche:

Tip #2: Commenting Blocks of Code
This tip I learned from fellow SAS Press author Alan Wilson at SAS Global Forum 2008 in San Antonio. It might be a bit overly dramatic to say that this tip changed my life, but that’s not far from the truth! So, I am paying this tip forward. Thank you, Alan!

To comment out a whole block of code, simply highlight the lines of code, hold down the control key, and press the forward slash ( /). SAS will take those lines of code and turn them into comments by adding a /* to the beginning of each line and an */ at the end of each line.

To convert the commented lines back to code, highlight the lines again, hold down the control and shift keys, and press the forward slash ( /). This works in both the SAS Windowing environment (Display Manager) and SAS Enterprise Guide.

If you are using SAS Studio as your programming interface, you comment the same way, but to uncomment, just hold down the control key and then press the forward slash.

From Susan J. Slaughter:
Tip #3: Susan's Macro Mottos
There is no question that writing and debugging SAS macros can be a challenge. So I have two "macro mottos" that I use to help keep me on track.

“Remember, you are writing a program that writes a program.”

This is the most important concept to keep in mind whenever you work with SAS macros. If you feel the least bit confused by a macro, repeating this motto can help you to see what is going on. I speak from personal experience here. This is my macro mantra.

“To avoid mangling your macros, always write them one piece at a time.”

This means, write your program in standard SAS code first. When that is working and bug-free, then add your %MACRO and %MEND statements. When they are working, then add your parameters, if any, one at a time. If you make sure that each macro feature you add is working before you add another one, then debugging will be vastly simplified.

And, this is the best time ever to learn SAS! When I first encountered SAS, there were only two ways that I could get help. I could either ask another graduate student who might or might not know the answer, or I could go to the computer center and borrow the SAS manual. (There was only one.) Today it's totally different.

I am continually AMAZED by the resources that are available now—many for FREE. Here are four resources that every new SAS user should know about:

1. SAS Studio
This is a wonderful new interface for SAS that runs in a browser and has both programming and point-and-click features. SAS Studio is free for students, professors, and independent learners. You can download the SAS University Edition to run SAS Studio on your own computer, or use SAS OnDemand for Academics via the Internet.

2. Online classes
Two of the most popular self-paced e-learning classes are available for free: SAS Programming 1: Essentials, and Statistics 1. These are real classes which in the past people paid hundreds of dollars to take.

3. Videos
You can access hundreds of SAS training videos, tutorials, and demos at support.sas.com/training. Topics range from basic (What is SAS?) to advanced (SAS 9.4 Metadata Clustering).

4. Community of SAS users
If you encounter a problem, it is likely that someone else faced a similar situation and figured out how to solve it. On communities.sas.com you can post questions and get answers from SAS users and developers. On the site, www.lexjansen.com, you can find virtually every paper ever presented at a SAS users group conference.

If you want even more tips and tricks, check out our Exercises and Projects for The Little SAS Book, Fifth Edition! Let us know if enjoyed these tips in the comment boxes below.

New to SAS? Ready to learn more? Check out these tips and tricks from the authors of Exercises and Projects for The Little SAS Book, 5th Edition was published on SAS Users.

10月 102018
 

Deep learning (DL) is a subset of neural networks, which have been around since the 1960’s. Computing resources and the need for a lot of data during training were the crippling factor for neural networks. But with the growing availability of computing resources such as multi-core machines, graphics processing units (GPUs) accelerators and hardware specialized, DL is becoming much more practical for business problems.

Financial institutions use a large number of computations to evaluate portfolios, price securities, and financial derivatives. For example, every cell in a spreadsheet potentially implements a different formula. Time is also usually of the essence so having the fastest possible technology to perform financial calculations with acceptable accuracy is paramount.

In this blog, we talk to Henry Bequet, Director of High-Performance Computing and Machine Learning in the Finance Risk division of SAS, about how he uses DL as a technology to maximize performance.

Henry discusses how the performance of numerical applications can be greatly improved by using DL. Once a DL network is trained to compute analytics, using that DL network becomes drastically faster than more classic methodologies like Monte Carlo simulations.

We asked him to explain deep learning for numerical analysis (DL4NA) and the most common questions he gets asked.

Can you describe the deep learning methodology proposed in DL4NA?

Yes, it starts with writing your analytics in a transparent and scalable way. All content that is released as a solution by the SAS financial risk division uses the "many task computing" (MTC) paradigm. Simply put, when writing your analytics using the many task computing paradigm, you organize code in SAS programs that define task inputs and outputs. A job flow is a set of tasks that will run in parallel, and the job flow will also handle synchronization.

Fig 1.1 A Sequential Job Flow

The job flow in Figure 1.1 visually gives you a hint that the two tasks can be executed in parallel. The addition of the task into the job flow is what defines the potential parallelism, not the task itself. The task designer or implementer doesn’t need to know that the task is being executed at the same time as other tasks. It is not uncommon to have hundreds of tasks in a job flow.

Fig 1.2 A Complex Job Flow

Using that information, the SAS platform, and the Infrastructure for Risk Management (IRM) is able to automatically infer the parallelization in your analytics. This allows your analytics to run on tens or hundreds of cores. (Most SAS customers run out of cores before they run out of tasks to run in parallel.) By running SAS code in parallel, on a single machine or on a grid, you gain orders of magnitude of performance improvements.

This methodology also has the benefit of expressing your analytics in the form of Y= f(x), which is precisely what you feed a deep neural network (DNN) to learn. That organization of your analytics allows you to train a DNN to reproduce the results of your analytics originally written in SAS. Once you have the trained DNN, you can use it to score tremendously faster than the original SAS code. You can also use your DNN to push your analytics to the edge. I believe that this is a powerful methodology that offers a wide spectrum of applicability. It is also a good example of deep learning helping data scientists build better and faster models.

Fig 1.3 Example of a DNN with four layers: two visible layers and two hidden layers.

The number of neurons of the input layer is driven by the number of features. The number of neurons of the output layer is driven by the number of classes that we want to recognize, in this case, three. The number of neurons in the hidden layers as well as the number of hidden layers is up to us: those two parameters are model hyper-parameters.

How do I run my SAS program faster using deep learning?

In the financial risk division, I work with banks and insurance companies all over the world that are faced with increasing regulatory requirements like CCAR and IFRS17. Those problems are particularly challenging because they involve big data and big compute.

The good news is that new hardware architectures are emerging with the rise of hybrid computing. Computers are increasing built as a combination of traditional CPUs and innovative devices like GPUs, TPUs, FPGAs, ASICs. Those hybrid machines can run significantly faster than legacy computers.

The bad news is that hybrid computers are hard to program and each of them is specific: you write code for GPU, it won’t run on an FPGA, it won’t even run on different generations of the same device. Consequently, software developers and software vendors are reluctant to jump into the fray and data scientist and statisticians are left out of the performance gains. So there is a gap, a big gap in fact.

To fill that gap is the raison d’être of my new book, Deep Learning for Numerical Applications with SAS. Check it out and visit the SAS Risk Management Community to share your thoughts and concerns on this cross-industry topic.

Deep learning for numerical analysis explained was published on SAS Users.

10月 082018
 

When was the last time you or your colleagues wanted access to data and tools to produce reports and dashboards for a business need? Probably within the last hour. Self-service BI applications – gaining popularity as we speak – make gaining insights and decision making faster. But they've also generated a greater need for governance.

Part of governance is understanding the data lifecycle or data lineage. For example, a co-worker performed some modifications to a dataset and used it to produce a report that you would like to use to help solve a business need.  How can you be sure that the information in this report is accurate? How did the producer of the report calculate certain measures? From what original data set was the report based?

SAS provides many tools to help govern platforms and solutions.  Let’s look at one of those tools to understand the data lifecycle: SAS Lineage Viewer.

Here we have a report created to explore and visualize telecommunications data using SAS Visual Analytics.  The report shows our variable of interest, cross-sell and up-sell flag, and its relationship to other variables. This report will be used to target customers for cross-sell or up-sell.

This report is based on an Analytical Base Table (ABT) that was created by joining two data sets:

  1. Usage information from a subset of customers who have contacted customer care centers.
  2. Cleansed demographics data.

The name of the joined dataset the report is based on is LG_FINAL_ABT.

To make sure we understand the data behind this report we’ll explore it using a lineage viewer (you will need to be assigned to the "Data Management Business User” or “Data Management: Lineage” group, which an administrator can help you with).  From the applications menu, select Explore Lineage.

 

We’ll click on Search for Subjects and search for the report we were just reviewing: Telecommunications.

I’ll enter Telecommunications in the search field then select the Telecommunications report.

The first thing I see is the LG_Final_ABT CAS table my report is dependent on.

If I click on the + sign on the top right corner of the data, LG_Final_ABT, I can see all the other relationships to that CAS table.  There is a Model Studio project, two Visual Analytics Reports (including the report we looked at), and a data view that are all dependent on the LG_FINAL_ABT CAS Table.  This diagram also shows us that the LG_FINAL_ABT CAS table is dependent on the Public CAS library.  We also see that the LG_FINAL_ABT CAS table was loaded into CAS from the LG_FINAL_ABT.sashdat file.

Let’s explore the LG_FINAL_ABT.sashdat file to see its lineage. Clicking on the + expands the view. In the following diagram, I expanded all the remaining items to see the full data lifecycle.

This image shows us the whole data life cycle.  From LG_FINAL_ABT.sashadat we see that it is dependent on the Create Final LG ABT data preparation plan.  That plan is dependent on two CAS tables; LG_CUSTOMER and LG_ORIG_ABT.  The data lineage viewer shows us that the LG_CUSTOMER table was loaded in from a csv file (lg_customer.csv) and the LG_ORIG_ABT CAS file was loaded in from a sas data set (lg_orig_abt.sas7dbat).

To dive deeper into the mashups and data manipulations that took place to produce LG_FINAL_ABT.sashdat, we can open the data preparation plan.  To do this I’ll right click on Create Final LG ABT and select Actions then Prepare Data.

Here is the data preparation plan.  At the top you can see that the creator of this data set performed five steps – Gender Analysis, Standardize, Remove, Rename and Join.

To get details into each of these steps, click on the titles at the top.  Clicking on Gender Analysis, I see that a gender analysis was performed based on the customer_name field and the results were added to the data set in a variable named customer_name_GND.

Clicking on the Standardize title, I see that there were two standardization tasks performed on the original data set. One for customer state and the other for customer phone number.  I can also see that the results were placed in new fields (customer_state_STND and customer_primary_phone_STND).

Clicking on the Remove title, I see that three variables were dropped from the final dataset.  These variables were the original ones that the user had “fixed” in the previous steps: customer_gender, customer_state, and customer_primary_phone.

Clicking on the Rename title, I see that the new variables have been renamed.

The last step in the process is a join. Clicking on the Join title I see that LG_CUSTOMER was joined with LG_ORIG_ABT based on an inner join on Customer_ID.

We have just walked through the data lineage or data lifecycle for the dataset LG_FINAL_ABT, using SAS tools. I now understand how the data in the report we were looking at was generated. I am confident that the information that I gain from the report will be accurate.

Since sharing information and data among co-workers has become so common, it's now more crucial than ever to think about the data lifecycle. When you gain access to a report that you did not create it is always a good idea to check the underlying data to ensure that you understand the data and that any business insights gained are accurate. Also, if you are sharing data with others and you want to make modifications to it, you should always check the lineage to ensure that you won’t be undermining someone else’s work with changes you make. Thanks to SAS Visual Analytics, all the necessary tools one needs to review data lineage are all available within one interface.

Keep track of where data originated with data lineage in SAS was published on SAS Users.

10月 052018
 

In my earlier blog, I described how to create maps in SAS Visual Analytics 8.2 if you have an ESRI shapefile with  granular geographies, such as counties, that you wish to combine into regions. Since posting this blog in January 2018, I received a lot of questions from users on a range of mapping topics, so I thought a more general post on using – and troubleshooting - custom polygons in SAS Visual Analytics on Viya was in order. Since version 8.3 is now generally available, this post is tailored to the 8.3 version of SAS Visual Analytics, but the custom polygon functionality hasn’t really changed between the 8.2 and 8.3 releases.

What are custom polygons?

Custom polygons are geographic boundaries that enable you to visualize data as shaded areas on the map. They are also sometimes referred to as a choropleth maps. For example, you work for a non-profit organization which is trying to decide where to put a new senior center. So you create a map that shows the population of people over 65 years of age by US census tract. The darker polygons suggest a larger number of seniors, and thus a potentially better location to build a senior center:

SAS Visual Analytics 8.3 includes a few predefined polygonal shapes, including countries and states/provinces. But if you need something more granular, you can upload your own polygonal shapes.

How do I create my own polygonal shapes?

To create a polygonal map, you need two components:

  1. A dataset with a measure variable and a region ID variable. For example, you may have population as a measure, and census tract ID as a region ID. A simple frequency can be used as a measure, too.
  2. A “polygon provider” dataset, which contains the same region ID as above, plus geographic coordinates of each vertex in each polygon, a segment ID and a sequence number.

So where do I get this mysterious polygon provider? Typically, you will need to search for a shapefile that contains the polygons you need, and do a little bit of data preparation. Shapefile is a geographic data format supported by ESRI. When you download a shapefile and look at it on the file system, you will see that it contains several files. For example, my 2010 Census Tract shapefile includes all these components:

Sometimes you may see other components present as well. Make sure to keep all components together.

To prepare this data for SAS Visual Analytics, you have two options.

Preparing shapefile for SAS Visual Analytics: The long way

One method to prepare the polygon provider is to run PROC MAPIMPORT to convert the shapefile into a SAS dataset, add a sequence ID field and then load into the Cloud Analytic Services (CAS) server in SAS Viya. The sequence ID is mandatory, as it helps SAS Visual Analytics to draw the lines connecting vertices in the correct order.

A colleague recently reached out for help with a map of Census block groups for Chatham County in North Carolina. Let’s look at his example:

The shapefile was downloaded from here. We then ran the following code on my desktop:

libname geo 'C:\...\Data;
 
proc mapimport datafile="C:\...\Data\Chatham_County__2010_Census_Block_Groups.shp"
out=work.chatham_cbg;
run;
 
data geo.chatham_cbg;
set  chatham_cbg;
seqno=_n_;
run;

We then manually loaded the geo.chatham_cbg dataset in CAS using self-service import in SAS Visual Analytics. If you are not sure how to upload a dataset to CAS, please check the %SHIMPR. The macro will automatically run PROC MAPIMPORT, create a sequence ID variable and load the table into CAS. Here’s an example:

%shpimprt(shapefilepath=/path/Chatham_County__2010_Census_Block_Groups.shp, id=GEOID, outtable=Chatham_CBG, cashost=my_viya_host.com,   casport=5570, caslib='Public');

For this macro to work, the shapefile must be copied to a location that your SAS Viya server can access, and the code needs to be executed in an environment that has SAS Viya installed. So, it wouldn’t work if I tried to run it on my desktop, which only has SAS 9.4 installed. But it works beautifully if I run it in SAS Studio on my SAS Viya machine.

Configuring the polygon provider

The next step is to configure the polygon provider inside your report. I provided a detailed description of this in my earlier blog, so here I’ll just summarize the steps:

  • Add your data to the SAS Visual Analytics report, locate the region ID variable, right-click and select New Geography
  • Give it a name and select Custom Polygonal Shapes as geography type
  • Click on the Custom Polygon Provider box and select Define New Polygon Provider
  • Configure your polygon provider by selecting the library, table and ID column. The values in your ID column must match the values of the region ID variable in the dataset you are visualizing. The ID column, however, does not need to have the same name as in the visualization dataset.
  • If necessary, configure advanced options of the polygon provider (more on that in the troubleshooting section of this blog).

If all goes well, you should see a preview of your polygons and a percentage of regions mapped. Click OK to save your geographic item, and feel free to use it in the Geo Map object.

I followed your instructions, but the map is not working. What am I missing?

I observed a few common troubleshooting issues with custom maps, and all are fairly easy to fix. The table below summarizes symptoms and solutions.
 

Symptom Solution
In the Geographic Item preview, 0% of the regions are mapped. For example:
Check that the values in the region ID variable match between the main dataset and the polygon provider dataset.
I successfully created the map, but the colors of the polygons all look the same. I know I have a range of values, but the map doesn’t convey the differences. In your main dataset, you probably have missing region ID values or region IDs that don’t exist in the polygon provider dataset. Add a filter to your Geo Map object to exclude region IDs that can’t be mapped.

Only a subset of regions is rendered. You may have too many points (vertices) in your polygon provider dataset. SAS Visual Analytics can render up to 250,000 points. If you have a large number of polygons represented in a detailed way, you can easily exceed this limit. You have two options, which you can mix and match:

(1)    Filter the map to show fewer polygons

(2)    Reduce the level of detail in the polygon provider dataset using PROC GREDUCE. See example here. Also, if you imported data using the %shpimprt macro, it has an option to reduce the dataset. Here’s a handy link to In the Geographic Item preview, the note shows that 100% of the regions are mapped, but the regions don’t render, or the regions are rendered in the wrong location (e.g., in the middle of the ocean) and/or at an incorrect scale.

This is probably the trickiest issue, and the most likely culprit is an incorrectly specified coordinate space code (EPSG code). The EPSG code corresponds to the type of projection applied to the latitude and longitude in the polygon provider dataset (and the originating shapefile). Projection is a method of displaying points from a sphere (the Earth) on a two-dimensional plane (flat surface). See this tutorial if you want to know more about projections.

There are several projection types and numerous flavors of each type. The default EPSG code used in SAS Visual Analytics is EPSG:4326, which corresponds to the unprojected coordinate system.  If you open advanced properties of your polygon provider, you can see the current EPSG code:

Finding the correct EPSG code can be tricky, as not all shapefiles have consistent and reliable metadata built in. Here are a couple of things you can try:

(1)    Open your shapefile as a layer in a mapping application such as ArcMap (licensed by ESRI) or QGIS (open source) and view the properties of the layer. In many cases the EPSG code will appear in the properties.

(2)    Go to the location of your shapefile and open the .prj file in Notepad. It will show the projection information for your shapefile, although it may look a bit cryptic. Take note of the unit of measure (e.g., feet), datum (e.g., NAD 83) and projection type (e.g., Lambert Conformal Conic). Then, go to https://epsg.io/ and search for your geography.  Going back to the example for Chatham county, I searched for North Carolina. If more than one code is listed, select a few codes that seem to match your .prj information the best, then go back to SAS Visual Analytics and change the polygon provider Coordinate Space property. You may have to try a few codes before you find the one that works best.

I ruled out a projection issue, the note in Geographic Item preview shows that 100% of the regions are mapped, but the regions still don’t render. Take a look at your polygon provider preparation code and double-check that the order of observations didn’t accidentally get changed. The order of records may change, for example, if you use a PROC SQL join when you prepare the dataset. If you accidentally changed the order of the records prior to assigning the sequence ID, it can result in an illogical order of points which SAS Visual Analytics will have trouble rendering. Remember, sequence ID is needed so that SAS Visual Analytics can render the outlines of each polygon correctly.

You can validate the order of records by mapping the polygon provider using PROC GMAP, for example:

proc gmap map=geo.chatham_cbg data=geo.chatham_cbg;
   id geoid;
   choro geoid / nolegend levels=1;
run;

For example, in image #1 below, the records are ordered correctly. In image #2, the order or records is clearly wrong, hence the lines going crisscross.

 

As you can see, custom regional maps in SAS Visual Analytics 8.3 are pretty straightforward to implement. The few "gotchas" I described will help you troubleshoot some of the common issues you may encounter.

P.S. I would like to thank Falko Schulz for his help in reviewing this blog.

Troubleshooting custom polygon maps in SAS Visual Analytics 8.3 was published on SAS Users.

10月 042018
 

You can now conduct a live demo of SAS Visual Analytics on your mobile device to participants who are geographically dispersed by using the Present Screen feature, a tucked-away option in the SAS Visual Analytics app for iPad, iPhone, and Android devices. Let’s say I am looking at a report on my mobile device, and I have questions about a couple of items for my colleagues Joe and Anita, both of whom are located in two different cities. The three of us are able to see the report while I demo it, drawing their attention to specific areas of interest in the report.

Sitting in my office, or from any location where I have a Wi-Fi or cellular connection, I can use the Present Screen feature to do a live shared presentation with Joe and Anita. And I don't have to present just one report. During the live presentation, I can close a report, open a different one, perform interactions, and move around in the app between different reports.

And here’s the real beauty of this feature. Neither Joe nor Anita need to have the SAS Visual Analytics app on a mobile device, or SAS Visual Analytics running on a desktop. The only requirement for participants is that they have a mobile device (could be an iOS, Android, or Windows device), a laptop, or a desktop system.  Internet access via Wi-Fi (or a cellular connection for mobile devices), an email client for receiving an email notification, and a Web browser are necessary. VPN connectivity might be required if the participants' organization requires VPN.

Plus, you can conduct your live presentation for up to 10 people! Before we move on, note that you can also use the iOS feature, AirDrop, on your iOS device to engage participants with the Present Screen feature. This is useful if you’re in a room with a bunch of folks who have iOS devices, and you want to do live sharing.

Ready to try it? Here’s a short checklist of what you need to do a live presentation of SAS Visual Analytics reports.

Requirements for the presenter

  • Use any one of these devices to present your screen for a shared presentation to your participants: iPad, iPhone, Android tablet or smartphone
  • SAS Visual Analytics app installed on your device
  • Connection via Wi-Fi or cellular to the server where the report(s) reside
  • Subscription to the reports that you want to present and share
  • Email client on the iPad or phone
  • VPN connectivity if your organization requires it

Couple of things to note

Say you're presenting to 10 participants via email or AirDrop.  Note that as the presenter, you must have the SAS Visual Analytics app open in your device with the Present Screen feature selected and active for your guests to be able to see your reports. Once you exit the app, the screen presentation session ends and the email or AirDrop invitations are no longer valid.

And a note on participant information. After a participant forwards an email invitation to view the presentation to a colleague, the recipient is required to enter a name and email address before joining your live screen presentation. You, as the presenter, get to see the names of the folks that have joined your screen presentation. Participant names and email addresses simply let the presenter know who all have joined to presentation. Neither the participants' names nor their email addresses are validated by the SAS Visual Analytics server.

Supported versions for SAS Visual Analytics reports

You can present and share reports with the Present Screen feature for these versions of SAS Visual Analytics:

7.3, 7.4, 8.1, 8.2, 8.3

How to start the screen presentation

In the Analytics report I'm subscribed to from the SAS Visual Analytics app on my iPad, I choose Present Screen.

Choosing Present Screen in the subscribed report

I am reminded that I can present my screen to a maximum of 10 participants. I click OK.

The app prompts me to send email or choose AirDrop to present my screen to participants - I choose email.

My email client opens, and the app includes instructions along with the link that takes them to my live screen presentation. I enter the email addresses for my participants and send the email.

In the email that Anita receives on her desktop PC, she taps on the link which takes her to her Web browser where my screen presentation is set to start soon.

Anita enters her name and email address, and notes that the presentation has not yet started.

Joe, who has logged on to the presentation from his Android phone, is also presented with the same message on his smartphone.

To begin the screen presentation, I tap on the blinking cursor in the app.

The app reminds me that now my participants can see everything on my iPad screen.

Next, a blue bar at the top of the report indicates that my screen presentation is live and can be seen by Anita and Joe. Now I can begin my report presentation, or exit this report and open a different report to share.

Here is my screen on the iPad Mini where I started my screen presentation:

Anita's screen on her Windows desktop monitor:

Joe's screen on the Android smartphone:

When I am finished with the presentation, I tap on the 'stop' button to end it.

A message displays to indicate that the presentation has ended. Here's an example of that message from Joe's Android smartphone.

Do it live! How to present your screen from the SAS Visual Analytics app was published on SAS Users.

9月 262018
 

Here's a challenge.  You're a passenger in an automobile, and you've been asked to evaluate whether the driver's habits behind the wheel are "safe" or "risky."  But there's a catch: you have to collect all of your information with your eyes closed.

Think about it -- with your eyes shut, you're denied important information such as your location, traffic conditions, speed limits and traffic signals, and weather conditions.  Sightless, your only source of data comes from your sense of motion as the vehicle accelerates, slows down, and turns.

Sunish Menon, a PhD. researcher at State Farm Insurance, faced this challenge with his team as they designed the data collection scheme for State Farm's Drive Safe and Save program.  Sunish shared his experience and ideas with attendees at the Analytics Experience 2018 conference in San Diego.

Accelerometer: simple measurements with rich results

Sunish's team knew that they were going to build a smartphone app to support the Drive Safe and Save program.  After all, a smartphone can collect a ton of information: location with GPS, phone use during a trip, traffic conditions, trip duration and speed, and more.  But accessing these details has a cost.  Every sensor on a phone consumes precious battery life, and potential users might not be comfortable sharing their location constantly with an insurance company -- even if there is a premium discount at stake.  So what's the minimum amount of information you can collect and still assemble a meaningful profile?  Maybe capturing the changes in speed and direction is enough.

Like people, your smartphone also has a "sense of motion" -- it's called an accelerometer.  As you might guess from its name, an accelerometer is a small electrical sensor that measures acceleration.  For a quick physics refresher, let's review the difference between speed, velocity, and acceleration:

Speed How fast an object is moving, usually expressed as distance over time (example: 10 meters per second)
Velocity How fast an object is moving and in which direction (example: 10 meters per second, to the east)
Acceleration The rate of change in the velocity of an object. Since it’s a rate of change, it’s expressed as distance over time (speed), per unit of time.  For example, to change speed from 0 to 60 miles-per-hour in 10 seconds, an object must accelerate at 2.682 meters per second per second, or 2.682 m/s2.

 

Your smartphone measures acceleration across three axes, traditionally labeled as x, y, and z.  The measurements are sampled multiple times per second.  Each measurement reflects acceleration across one of the axes.  Taken together, you can get a sense for the phone's overall direction.

I've included Sunish's diagram of how these axes are oriented on a smartphone.  The x axis is horizontal along the face of the phone, and the y axis is vertical along the face.  The z axis is along the perpendicular plane passing through the center of the phone.  Depending on the direction of the phone's movement, acceleration values might have a positive or negative value.

Capturing my commute data

Inspired by Sunish's presentation, I decided to get a bit of hands-on practice with accelerometer data. I installed a free app on my phone to capture the raw data from the accelerometer,  Here's what the data values look like, as measured from the start of my driving commute from work to home.

In these data, the first value is a record counter.  The second "big number" value is a timestamp value in Unix epoch format.  That's the number of milliseconds since midnight on January 1, 1970.  And the next three values are the acceleration measurements for the x, y, and z axis respectively.  Acceleration is measured in meters-per-second squared, or m/s2.  For reference, keep in mind that Earth's gravity -- the force that keeps us grounded (literally) -- is about 9.8 m/s(1g).

The data from my commute contains over 85,000 measurements, captured over about 30 minutes (it was a busy Friday afternoon).  I used the SERIES plot in PROC SGPLOT to create a simple visualization.  Can you tell where the longest stoplight occurs?  (It's right near the shopping mall -- I really don't like that intersection.)

commute accelerometer

Teasing out "events" from the data

In my commute as represented in the above chart, it seems simple enough to locate the mundane events of accelerating, braking, and waiting in traffic.  There are a few spikes and dips that might represent more dramatic braking events, or perhaps a fast start from a traffic light (my car has some pep!).  Let's use some histograms to look at these measurements another way.

histograms of x y z

Most of my commute is uninteresting, as I'm driving at a steady speed or waiting in traffic.  The histogram shows the x axis measurements are centered around 0.  But why don't the y and z axes behave the same?  During my drive,  my phone is positioned nearly vertical in a dashboard holder, with perhaps a 30-degree forward tilt.  Gravity works on all of us at about 9.8m/s2.  With my phone at the vertical-ish tilt, you can see most of that force applied to the y axis, with some shared with the z axis.

Since the data collected represents a time series, it makes sense to apply a time series analysis to see if we can decompose its components and make the interesting events more obvious.  In Sunish's case, his team used PROC TIMESERIES also offers a SPECTRA statement for spectrum analysis for similar options.

Here's a tip: if you are trying this on your own and you get stuck, post a question to the SAS forecasting/time series community.  Experts are eager to answer!

Confounding factors when analyzing a drive

During a drive, the measurements from the accelerometer "start at zero" (or their natural baseline) only when the phone is lying flat, with the top of the phone pointed toward the front of the car.  But who keeps their phone stationed like that?  When I'm driving alone in my car, my phone is usually in a holder mounted on the dash, positioned nearly vertical, tilted slightly.  Or it's in my pants pocket.

Sunish presented a series of techniques to help control for this -- all of them applying more math than I am qualified to describe.  The smartphone also has a gyroscope sensor, which can measure the phone's "tilt" along any of its axes (labeled as pitch, roll, and yaw).  Combining these measurements with the acceleration readings, as well as controlling for the force of gravity, can help create a more accurate picture of your driving experience.

When I'm not alone in the car, the phone might not stay in one place.  A passenger might pick it up to find directions, or to reference IMDB to settle a bet.  All of those movements will also register on the accelerometer, and how will a "safe driving app" judge these actions?  That's a challenge for analytics.

Safe driver versus risky driver: more than just measurement

Please do not rush to judgement about my driving behavior from this one sample.  In fact, even if you had hundreds of samples of my driving, it would probably be difficult to fairly judge whether I am a high-risk driver.

For insurance companies, assessment of risk is influenced more by how similar you are to known risky populations.  That's why young drivers tend to command higher premiums.  It's not just because they are young, exactly, but it's because insurance companies have to pay out more claims due to accidents caused by young drivers.  The cause might be due to their inexperience and immaturity, but that's almost beside the point.  It's a numbers game.

By collecting data from millions of car trips across a wide range of customers, an insurance company can apply machine learning to discern the patterns of drivers who make claims versus those who don't.  If your driving patterns are scored as too similar to those of other drivers who cause accidents...well, don't expect to receive a discount when you share your driving data.

Programs like State Farm's Drive Safe and Save accomplish more than just "proving" that you're a good driver.  The program incents you to be more conscious of your driving behavior, especially while you have that app running and collecting data.  State Farm provides periodic reports to program subscribers that show how your driving behavior compares (favorably or not) to other drivers in the pool.  The gamification and feedback aspect of the program might do just as much to improve driving as the promise of a discount.

 

Using your smartphone accelerometer to build a safe driving profile was published on SAS Users.