4月 252019
 

As a SAS programmer, you are asked to do many things with your data -- reading, writing, calculating, building interfaces, and occasionally sending data outside of SAS. One of the most popular outputs you may be tasked with creating is likely a Microsoft Excel workbook. Have you ever heard, “just send me the spreadsheet”?

For an internal project the task is easy, just open the SAS ODS EXCEL destination, run PROC PRINT, and close SAS ODS EXCEL and the workbook spreadsheet is ready. But if the workbook or the spreadsheet is to be delivered somewhere else you may need to spruce it up a bit. Of course, you can manually change virtually everything on the spreadsheet, but that takes lots of employee time. And if the spreadsheet is delivered on a periodic basis, you may not run it the same every time.

Saving you time and money

Suppose you run a PROC PRINT with a “BY” statement and produce a Microsoft Excel workbook with 100 pages. If each of those pages need to be printed and distributed to 100 clients by mail, do you want to be the person who changes each of those to print as a landscape printout? The SAS ODS Excel destination has over 125 options and sub-options that can perform various tasks while the workbook is being written. One such task sets the worksheet to print in “landscape” format.

As a programmer, I know that when I want to start a new project or learn new software, I look to two places in a book: the index and the table of contents. If I can think of a key word that might help me, I look to the index. But when searching general topics, I use the Table of Contents (TOC). It always frustrates me if the TOC is in alphabetical order, so I decided to write my TOC as groups of options and SAS commands that impacted similar parts or features of the Excel Workbook.

To see this in action, the bullet points I have listed below identify the major topic sections of the book. These are, in fact, chapter titles presented after the introduction:

    • ODS Tagset versus Destination
    • ODS Excel Destination Actions
    • Setting Excel Document Property Values
    • Options That Affect the Workbook
    • Arguments that Affect Output Features
    • Options That Affect Worksheet Features
    • Options That Affect Print Features
    • Column, Row, and Cell Features

Take a look inside

Allow me to describe each of these topics in a few words:

    ODS Tagset versus Destination
    Many people have used the SAS ODS Tagset EXCELXP and will find many parts of the SAS ODS EXCEL destination to be very similar in both syntax and function. A tagset is Proc template code that can be changed by the user, while a SAS ODS destination is a built-in feature of SAS, much like a PROC or FUNCTION that cannot be changed by the users. This section also describes the ID feature of ODS which allows you to write more than one EXCEL workbook at a time.

    ODS Excel Destination Actions
    This area describes ODS features that may not be exclusive to ODS EXCEL but are useful in finding and choosing SAS outputs to be processed.

    Setting Excel Document Property Values
    Here you are shown how to change the comments, keywords, author, title, and other parts of the Excel Property sheet.

    Options That Affect the Workbook
    This section of the book shows you how to name the workbook, create blank worksheets, create a table of contents or index of worksheets within the workbook, change worksheet tab colors, and other options.

    Arguments that Affect Output Features
    The output features described here include changing the output style (coloration of the worksheet sections), finding and using stylesheet anchors, building and using Cascading Style Sheets, changing the Dots Per Inch (DPI) of the output data and or graphs, and adding text to the worksheet.

    Options That Affect Worksheet Features
    SAS has many options that describe the output data. These include titles, footnotes, and byline text. Additionally, SAS can group output worksheets in many ways including by page, by proc, by table, by by-group, or even no separation at all. Data can also be set to “FITTOPAGE” or the height or width can be selected along with adding sheet names or labels to the EXCEL output worksheets.

    Options That Affect Print Features
    Excel has many print features like printing in “black and white” only, centering horizontally or vertically, landscape or portrait, draft quality or standard, selecting the print order of the data, selecting the area to print, EXCEL headers and EXCEL footnotes, and others. All of which SAS can adjust as the workbook is being written.

    Column, Row, and Cell Features
    Finally, SAS can adjust column and row features like adding filters, changing the widths and heights or rows and columns, hiding rows or columns, inserting formulas, and even placing the data somewhere other than row one column one.

Ready to see the full Table of Contents? Click here.

Ready, set, go

My book Exchanging Data From SAS® to Excel: The ODS Excel Destination expands upon the SAS documentation by giving full descriptions and examples including SAS code and EXCEL output for nearly every option and sub-option of the SAS ODS EXCEL software. In addition to this blog, check out a free chapter of my book to get started making your worksheets beautifully formatted. Get ready to follow the money and make your reports come out perfect for publication in no time!

Making the most of the ODS Excel destination was published on SAS Users.

4月 252019
 

I’m excited because in a couple days I will fly to Dallas for SAS Global Forum 2019, the biggest SAS conference of the year, attended by thousands.

If you are coming, I hope you will say hello to me.  If you can’t make it to Dallas, you’ll be glad to know that many presentations will be livecast. Here is the schedule

A few highlights:

Sunday, April 28, 7:00-8:30 pm CT–Opening Session

Monday, April 29, 8:30-10:00 am CT–General Session: Technology Connection

Tuesday, April 30, 3:00-4:00 pm CT–Career Advice We’d Give to Our Kids: A Panel Discussion

Wednesday, May 1, 10:30-11:30 am CT–The Good, the Bad, and the Creepy: Why Data Scientists Need to Understand Ethics

These presentations may not be available after the conference so check the schedule and make sure to tune in at the right time.

 

 

 

 

4月 242019
 

The CUSUM test has many incarnations. Different areas of statistics use different assumption and test for different hypotheses. This article presents a brief overview of CUSUM tests and gives an example of using the CUSUM test in PROC AUTOREG for autoregressive models in SAS.

A CUSUM test uses the cumulative sum of some quantity to investigate whether a sequence of values can be modeled as random. Here are some examples:

  • A sequence of binary values (call them +1 and -1) might appear to be random, like a coin flip, or nonrandom. A random sequence has a cumulative sum that does not deviate far from zero, as I've discussed in a previous about the CUSUM test for randomness of a binary sequence.
  • In quality control, the CUSUM chart and test is used to monitor whether a process is drifting away from its mean. The CUSUM chart is centered around the mean value of the process. The process is said to be "out of control" if the cumulative sums of the standardized deviations exceed a specified range. The documentation for the CUSUM procedure in SAS/QC software includes an example and a page of formulas that describe the statistics behind the CUSUM chart.
  • In time series analysis, the CUSUM statistics use the sequence of residual deviations from a model to indicate whether the autoregressive model is misspecified. The CUSUM statistics are produced by PROC AUTOREG in SAS/ETS software.

Whereas the CUSUM test for a binary sequence uses cumulative sums for a discrete (+1, -1} sequence, the other tests assume that the sequence is a random sequence of normally distributed values. The main idea behind the tests are the same: The test statistic measures how far the sequence has drifted away from an expected value. If the sequence drifts too far too fast, the sequence is unlikely to be random.

CUSUM test for time series

Let's see how the CUSUM test in PROC AUTOREG can help to identify a misspecified model. For simplicity, consider two response variables, one that is linear in time (with uncorrelated errors) and the other that is quadratic in time. If you fit a linear model to both variables, the CUSUM test can help you to see that the model does not fit the quadratic data.

In a previous article, I discussed Anscombe's quartet and created two series that have the same linear fit and correlation coefficient. These series are ideal to use for the CUSUM test because the first series is linear whereas the second is quadratic. The following calls to PROC AUTOREG fit a linear model to each variable.

ods graphics on;
/* PROC AUTOREG models a time series with autocorrelation */
proc autoreg data=Anscombe2;
  Linear: model y1 = x;        /* Y1 is linear. Model is oorrectly specified. */
  output out=CusumLinear cusum=cusum cusumub=upper cusumlb=lower recres=RecursiveResid;
run;
 
proc autoreg data=Anscombe2;
  Quadratic: model y2 = x;     /* Y2 is quadratic. Model is misspecified. */
  output out=CusumQuad cusum=cusum cusumub=upper cusumlb=lower recres=RecursiveResid;
run;

The AUTOREG procedure creates a panel of standard residual diagnostic plots. The panel includes a plot of the residuals and a fit plot that shows the fitted model and the observed values. For the linear data, the residual plots seem to indicate that the model fits the data well:

In contrast, the same residual panel for the quadratic data indicates a systematic pattern in the residuals:

If this were a least squares model, which assumes independence of the residuals, those residual plots would indicate that this data-model combination does not satisfy the assumptions of the least squares regression model. For an autoregressive model, however, raw residuals can be correlated and exhibit a pattern. To determine whether the model is misspecified, PROC AUTOREG supports a special kind of residual analysis that uses recursive residuals.

The recursive residual for the k_th point is formed by fitting a line to the first k-1 points and then forming a standardized residual for the k_th point. The complete formulas are in the AUTOREG documentation. Galpin and Hawkins (1984) suggest plotting the cumulative sums of the recursive residuals as a diagnostic plot. Galpin and Hawkins credit Brown, Durbin, and Evans (1975) with proposing the CUSUM plot of the recursive residuals. The statistics output from the AUTOREG procedure are different than those in Galpin and Hawkin, but the idea and purpose behind the CUSUM charts are the same.

Galpin and Hawkin show a panel of nine plots that display different patterns that you might see in the CUSUM plots. I have reproduced two of the plots from the paper. (Remember, these graphs were produced in 1984!) The graph on the left shows what you should see for a correctly specified model. The cumulative sums stay within a region near the expected value of zero. In contrast, the graph on the right is one example of a CUSUM plot for a misspecified model.

The previous calls to PROC AUTOREG wrote the cumulative sums and the upper and lower boundaries of the confidence region to a data set. You can use PROC SGPLOT to create the CUSUM plot. The BAND statement is used to draw the confidence band:

ods layout gridded columns=2 advance=table;
 proc sgplot data=CusumLinear noautolegend;
    band x=x lower=lower upper=upper;
    series x=x y=cusum / break markers;
    refline 0  /axis=y noclip;
    xaxis grid; yaxis grid;
 run;
 proc sgplot data=CusumQuad noautolegend;
    band x=x lower=lower upper=upper;
    series x=x y=cusum / break markers;
    refline 0  /axis=y noclip;
    xaxis grid; yaxis grid;
 run;
ods layout end;
CUSUM graphs of cumulative  sums of recursive residuals

The graph on the left looks like a random walk on independent normal data. The cumulative sums stay within the colored confidence region. The model seems to fit the data. In contrast, the graph on the right quickly leaves the shaded region, which indicates that the model is misspecified.

In summary, there are many statistical tests that use a CUSUM statistic to determine whether deviations are random. These tests appear in many areas of statistics, including random walks, quality control, and time series analysis. For quality control, SAS supports the CUSUM procedure in SAS/QC software. For time series analysis, the AUTOREG procedure in SAS supports CUSUM charts of recursive residuals, which enable you to diagnose misspecified models.

You can download the SAS program that generates the graphs in this article.

The post A CUSUM test for autregressive models appeared first on The DO Loop.

4月 242019
 
The ODS Excel destination, which became a production feature in SAS 9.4M3 (TS1M3), generates Microsoft Excel workbooks in native XLSX format. This destination generates multiple worksheets per workbook with each output object (e.g., a table or graphic) the destination encounters by default. The ODS Excel destination is also flexible, enabling you to vertically control the worksheet and place output objects wherever you want. This blog demonstrates the destination’s flexibility and how you can modify its default behavior by using the ODS EXCEL statement's SHEET_INTERVAL= option.

Adding tables and graphics on the same Microsoft Excel worksheet

By default, the ODS Excel destination adds a new worksheet for each table and graphic. However, at times, you might not want to use this default behavior. If you want more control over this, the SHEET_INTERVAL= ODS Excel option determines when a new worksheet is created. Valid values for the SHEET_INTERVAL= option include:

  • TABLE (the default value) - new sheet for each table in output
  • NONE - keep the output that follows on the current sheet
  • PAGE - new sheet for each page of SAS output
  • PROC - new sheet beginning at the PROC step boundary
  • BYGROUP - new sheet for each BY group of output
  • NOW - begin a new sheet immediately

The value NOW, new for SAS 9.4M5 (TS1M5), triggers the creation of a new worksheet after the destination encounters the next output object.
As an example of opting not to use the default behavior, consider a case where you have a CONTENTS procedure without any options. This procedure generates three separate worksheets with the data-set attributes, the engine host information, and the variable list.

The following table shows the default output that you receive with three individual worksheets:

However, if you want to place all three objects on a single worksheet, you can do that by setting the option SHEET_INTERVAL="NONE". The option setting SHEET_INTERVAL="PROC" could also be used in this example which would create a new worksheet only when a new procedure is encountered.
The following example illustrates how to use this option to include all your output on the same worksheet:

ods excel file="c:\test.xlsx" options sheet_interval="none");
proc contents data=sashelp.class;
run; 
ods excel close;

Output

Adding text and tables to a new worksheet

Two of the most popular ways to add text on worksheets are to use either the ODS TEXT= statement or the ODSTEXT procedure with the Excel destination. The following example adds text to a worksheet by using the ODS TEXT= statement. You include this statement before each PRINT procedure in this example:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males");
ods text="Data for Male Patients";
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Notice that the first text string appears at the top of the first worksheet as expected. However, the text from the second ODS TEXT= statement appears at the bottom of this same page rather than at the top of the next worksheet containing the related data table. This behavior illustrates that the ODS TEXT= option is not very flexible. There is no good way to solve this issue.

However, you can use the SAS 9.4 ODSTEXT procedure in combination with the SHEET_INTERVAL= option to move the text string to the appropriate worksheet.
The following example uses PROC ODSTEXT and the SHEET_INTERVAL= option to move the text string "Statistics for Male Patients" to the top of the second worksheet:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males" sheet_interval="now");
ods excel options(sheet_name="Males" sheet_interval="none");
 
proc odstext;
   p "Data for Male Patients";
run;
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Adding multiple tables or graphs on the same worksheet

This final example demonstrates how you can use the SHEET_INTERVAL= option to add multiple tables and graphics to the same Excel worksheet. First, we use the SHEET_INTERVAL="NONE" option in the first ODS EXCEL statement to place the first table and graph on the same worksheet. Then, the SHEET_INTERVAL="NOW" option is included in the second ODS EXCEL statement option to create a second worksheet and write the second table and graph to that worksheet:

ods graphics / height=2.5in width=3.5in;
ods excel file="c:\scratch\test.xlsx" options(sheet_interval="none");
 
proc print data=sashelp.class(where=(sex="F"));
run;
 
proc sgplot data=sashelp.class(where=(sex="F"));
scatter x=age y=height;
run;
 
ods excel options(sheet_interval="now");
 
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
proc sgplot data=sashelp.class(where=(sex="M"));
scatter x=age y=height;
run;
 
ods excel close;

Output

Learn more

Control the location of tables, graphs, and text with ODS Excel was published on SAS Users.

4月 242019
 
The ODS Excel destination, which became a production feature in SAS 9.4M3 (TS1M3), generates Microsoft Excel workbooks in native XLSX format. This destination generates multiple worksheets per workbook with each output object (e.g., a table or graphic) the destination encounters by default. The ODS Excel destination is also flexible, enabling you to vertically control the worksheet and place output objects wherever you want. This blog demonstrates the destination’s flexibility and how you can modify its default behavior by using the ODS EXCEL statement's SHEET_INTERVAL= option.

Adding tables and graphics on the same Microsoft Excel worksheet

By default, the ODS Excel destination adds a new worksheet for each table and graphic. However, at times, you might not want to use this default behavior. If you want more control over this, the SHEET_INTERVAL= ODS Excel option determines when a new worksheet is created. Valid values for the SHEET_INTERVAL= option include:

  • TABLE (the default value) - new sheet for each table in output
  • NONE - keep the output that follows on the current sheet
  • PAGE - new sheet for each page of SAS output
  • PROC - new sheet beginning at the PROC step boundary
  • BYGROUP - new sheet for each BY group of output
  • NOW - begin a new sheet immediately

The value NOW, new for SAS 9.4M5 (TS1M5), triggers the creation of a new worksheet after the destination encounters the next output object.
As an example of opting not to use the default behavior, consider a case where you have a CONTENTS procedure without any options. This procedure generates three separate worksheets with the data-set attributes, the engine host information, and the variable list.

The following table shows the default output that you receive with three individual worksheets:

However, if you want to place all three objects on a single worksheet, you can do that by setting the option SHEET_INTERVAL="NONE". The option setting SHEET_INTERVAL="PROC" could also be used in this example which would create a new worksheet only when a new procedure is encountered.
The following example illustrates how to use this option to include all your output on the same worksheet:

ods excel file="c:\test.xlsx" options sheet_interval="none");
proc contents data=sashelp.class;
run; 
ods excel close;

Output

Adding text and tables to a new worksheet

Two of the most popular ways to add text on worksheets are to use either the ODS TEXT= statement or the ODSTEXT procedure with the Excel destination. The following example adds text to a worksheet by using the ODS TEXT= statement. You include this statement before each PRINT procedure in this example:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males");
ods text="Data for Male Patients";
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Notice that the first text string appears at the top of the first worksheet as expected. However, the text from the second ODS TEXT= statement appears at the bottom of this same page rather than at the top of the next worksheet containing the related data table. This behavior illustrates that the ODS TEXT= option is not very flexible. There is no good way to solve this issue.

However, you can use the SAS 9.4 ODSTEXT procedure in combination with the SHEET_INTERVAL= option to move the text string to the appropriate worksheet.
The following example uses PROC ODSTEXT and the SHEET_INTERVAL= option to move the text string "Statistics for Male Patients" to the top of the second worksheet:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males" sheet_interval="now");
ods excel options(sheet_name="Males" sheet_interval="none");
 
proc odstext;
   p "Data for Male Patients";
run;
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Adding multiple tables or graphs on the same worksheet

This final example demonstrates how you can use the SHEET_INTERVAL= option to add multiple tables and graphics to the same Excel worksheet. First, we use the SHEET_INTERVAL="NONE" option in the first ODS EXCEL statement to place the first table and graph on the same worksheet. Then, the SHEET_INTERVAL="NOW" option is included in the second ODS EXCEL statement option to create a second worksheet and write the second table and graph to that worksheet:

ods graphics / height=2.5in width=3.5in;
ods excel file="c:\scratch\test.xlsx" options(sheet_interval="none");
 
proc print data=sashelp.class(where=(sex="F"));
run;
 
proc sgplot data=sashelp.class(where=(sex="F"));
scatter x=age y=height;
run;
 
ods excel options(sheet_interval="now");
 
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
proc sgplot data=sashelp.class(where=(sex="M"));
scatter x=age y=height;
run;
 
ods excel close;

Output

Learn more

Control the location of tables, graphs, and text with ODS Excel was published on SAS Users.

4月 232019
 

From saving cheetahs to preserving fresh water systems, analytics plays a role in many inspiring conservation projects around the world. Read on to learn about a few of our favorites. 1. Protecting animals from extinction  Learn how analyzing footprints could help save cheetahs in Namibia and other endangered animals around [...]

5 inspiring reads for Earth Day was published on SAS Voices by Alison Bolen

4月 222019
 

Imagine a world where satisfying human-computer dialogues exist. With the resurgence of interest in natural language processing (NLP) and understanding (NLU) – that day may not be far off.

In order to provide more satisfying interactions with machines, researchers are designing smart systems that use artificial intelligence (AI) to develop better understanding of human requests and intent.

Last year, OpenAI used a machine learning technique called reinforcement learning to teach agents to design their own language. The AI agents were given a simple set of words and the ability to communicate with each other. They were then given a set of goals that were best achieved by cooperating (communicating) with other agents. The agents independently developed a simple ‘grounded’ language.

Grounded vs. inferred language


Human language is said to be grounded in experience. People grasp the meaning of many basic words by interaction – not by learning dictionary definitions by rote. They develop understanding in terms of sensory experience -- for example, words like red, heavy, above.

Abstract word meanings are built in relation to more concretely grounded terms. Grounding allows humans to acquire and understand words and sentences in context.

The opposite of a grounded language is an inferred language. Inferred languages derive meaning from the words themselves and not what they represent. In AI trained only on textual data, but not real-world representations, these methods lack true understanding of what the words mean.

What if the AI agent develops its own language we can’t understand?

It happens. Even if the researcher gives the agents simple English words the agent inevitably diverges to its own, unintelligible language. Recently researchers at Facebook, Google and OpenAI all experienced this phenomenon!

Agents are reward driven. If there is no reward for using English (or human language) then the agents will develop a more efficient shorthand for themselves.

That’s cool – why is that a problem?

When researchers at the Facebook Artificial Intelligence Research lab designed chatbots to negotiate with one another using machine learning, they had to tweak one of their models because otherwise the bot-to-bot conversation “led to divergence from human language as the agents developed their own language for negotiating.” They had to use what’s called a fixed supervised model instead.

The problem, there, is transparency. Machine learning techniques such as deep learning are black box technologies. A lot of data is fed into the AI, in this case a neural network, to train on and develop its own rules. The model is then fed new data which is used to spit out answers or information. The black box analogy is used because it is very hard, if not impossible in complex models, to know exactly how the AI derives the output (answers). If AI develops its own languages when talking to other AI, the transparency problem compounds. How can we fully trust an AI when we can’t follow how it is making its decisions and what it is telling other AI?

But it does demonstrate how machines are redefining people’s understanding of so many realms once believed to be exclusively human—like language. The Facebook researchers concluded that it offered a fascinating insight to human and machine language. The bots also proved to be very good negotiators, developing intelligent negotiating strategies.

These new insights, in turn, lead to smarter chatbots that have a greater understanding of the real world and the context of human dialog.

At SAS, we’re developing different ways to incorporate chatbots into business dashboards or analytics platforms. These capabilities have the potential to expand the audience for analytics results and attract new and less technical users.

“Chatbots are a key technology that could allow people to consume analytics without realizing that’s what they’re doing,” says Oliver Schabenberger, SAS Executive Vice President, Chief Operating Officer and Chief Technology Officer in a recent SAS Insights article. “Chatbots create a humanlike interaction that makes results accessible to all.” The evolution of NLP toward NLU has a lot of important implications for businesses and consumers alike.

Satisfying human-computer dialogues will soon exist, and will have applications in medicine, law, and the classroom-to name but a few. As the volume of unstructured information continues to grow exponentially, we will benefit from AI’s tireless ability to help us make sense of it all.

Further Resources:
Natural Language Processing: What it is and why it matters
White paper: Text Analytics for Executives: What Can Text Analytics Do for Your Organization?
SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, by Teresa Jade, Biljana Belamaric Wilsey, and Michael Wallis
Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS®, by Matthew Windham
SAS: What are chatbots?
Blog: Let’s chat about chatbots, by Wayne Thompson

Moving from natural language processing to natural language understanding was published on SAS Users.