8月 202019
 

‘Quality‘ means many things to many people. It’s subjective and depends on the industry and product being made, but the fundamental objective is to provide the best product to the right standard associated to fit, form and function. And cost and required profit margin must also be taken into account. [...]

How analytics can help meet the quality standard in manufacturing was published on SAS Voices by Tim Clark

8月 202019
 

‘Quality‘ means many things to many people. It’s subjective and depends on the industry and product being made, but the fundamental objective is to provide the best product to the right standard associated to fit, form and function. And cost and required profit margin must also be taken into account. [...]

How analytics can help meet the quality standard in manufacturing was published on SAS Voices by Tim Clark

8月 202019
 

You can now easily embed a Python script inside a SAS decision within SAS Intelligent Decisioning. If you want to execute in SAS Micro Analytic Service (MAS), you no longer need to wrap it in DS2 code. The new Python code node does it for you. Here is how you can achieve it in less than 5 minutes:

Ready? Steady? Go!

The Python Script

If you want to run the following in MAS:

X1=1
X2=2
if X1 == None:
   X1 = 0
if X2 == None:
   X2 = 0
Y = 0.55 + 1 * X1 + 2 * X2 
print(Y)

Convert it to a Python function to meet the PyMAS requirements:

def execute(X1, X2):
       "Output: Y"
       if X1 == None:
            X1 = 0
       if X2 == None:
            X2 = 0
        Y = 0.55 + 1 * X1 + 2 * X2
       return Y
 
X1=1
X2=2
print(execute(X1,X2))

In a Jupyter Notebook, it will look like this:

Create an input data set to test the results

In SAS Studio V:

cas mysession sessopts=(metrics=true);
caslib _all_ assign;
options dscas;
 
data CASUSER.X1X2 (promote=yes);
length X1 8 X2 8;
X1=1; X2=1; output;
X1=1; X2=2; output;
X1=1; X2=3; output;
X1=1; X2=4; output;
run;
cas mysession terminate;

Create a decision in SAS Intelligent Decisioning 5.3

Choose New Python code file and call it python_logic. Copy the code from the Jupyter Notebook: from def until return Y. Watch out for your indentation!

Save and Close. Go to Variables:

Click on variables X1, X2, Y and change their type to Decimal.

Save the Decision.

Publish the decision to MAS

Test the publishing destination

Click on the published validation. Choose the data set you created:

Run. The code is executed.

Check the execution results

Y is the output of the python function. For the second line in the X1X2 data set, where X1 = 1 X2 =2, we get the result 5.55. Just as in the Jupyter Notebook.

Concepts

About Decisions in SAS

Put simply, there are three main components to a decision in SAS: inputs, logic, and outputs.

Inputs: the decision needs input variables. These can come from a CAS data set, a REST API or manual inputs.

Logic: a decision is defined by business rules, conditions, analytic models, custom code (DS2), etc. The new version allows execution of Python code in PyMAS (see below).

Outputs: a decision computes an output based on inputs and logic.

About SAS Micro Analytic Service (MAS)

A picture says a thousand words; here is a simplified diagram of MAS architecture (thanks to Michael Goddard):

MAS Architecture: Execution engine

You can apply or publish a decision using MAS. The SAS Micro Analytic Service provides the capability to publish a decision into operational environments.

When deployed as part of SAS Decision Manager, MAS is called as a web application with a REST interface by both SAS Decision Manager and by other client applications. MAS provides hosting for DS2 and Python programs and supports a "compile-once, execute-many-times" usage pattern.

The REST interface provides easy integration with client applications and adds persistence and clustering for scalability and high availability.

Prerequisites for Python decisions

You need SAS Intelligent Decisioning 5.3 in SAS Viya 3.4. SAS Intelligent Decisioning 5.3 is the wiz-kid of SAS Decision Manager 5.2. You do not need a certain Python version in your environment, but if you use certain libraries (e.g.: numpy, scipy, etc.), they might depend on the Python version.

Debugging your Python-based decisions

If you cannot replicate the example, it might be useful to consult the MAS logs. Log with MobaXtrem (or the software of your choice) to your server. Browse to the log of the concerned microservice, e.g.: microanalyticservice = MAS.

cd /opt/sas/viya/config/var/log/microanalyticservice/default/

Connect to the node using SFTP and open the log files. Check for errors, such as:

2019-06-27T21:31:12,251 [00000007] ERROR App.tk.MAS – Module ‘python1_0’ failed to compile in user context ‘provider’.

Resolve the Python error, per the messages you find in the log.

Solution for some errors

When you've made changes in your environment and have trouble getting your Python decisions to work, try to restart the following services:

  • decisionmanager
  • compsrv
  • launcher
  • runlauncher
  • microanalyticservice

Acknowledgements

Thanks to Marilyn Tomasic for finding the solution on what to do if you do not get the expected results. Thanks to Yi Jian Ching for sharing his knowledge and material.

References

Execute Python inside a SAS Decision: Learn how in less than 5 minutes was published on SAS Users.

8月 192019
 

I'm old enough to remember when USA Today began publication in the early 1980s. As a teenager who was not particularly interested in current events, I remember scanning each edition for the USA Today Snapshots, a mini infographic feature that presented some statistic in a fun and interesting way. Back then, I felt that these stats made me a little bit smarter for the day. I had no reason to question the numbers I saw, nor did I have the tools, skill or data access to check their work.

Today I still enjoy the USA Today Snapshots feature, but for a different reason. An interesting infographic will spark curiosity. And provided that I have time and interest, I can use the tools of data science (SAS, in my case) and public data to pursue more answers.

In the August 7, 2019 issue, USA Today published this graphic about marijuana use in Colorado. Before reading on, I encourage you to study the graphic for a moment and see what questions arise for you.

Source: USA Today Snapshot from Aug 7 2019

I have some notes

For me, as I studied this graphic, several questions came to mind immediately.

  • Why did they publish this graphic? USA Today Snapshots are usually offered without explanation or context -- that's sort of their thing. So why did the editors choose to share these survey results about marijuana use in Colorado? As readers, we must supply our own context. Most of us know that Colorado recently legalized marijuana for recreational use. The graphic seems to answer the question, "Has marijuana use among certain age groups increased since the law changed?" And a much greater leap: "Has marijuana use increased because of the legal change?"
  • Just Colorado? We see trend lines here for Colorado, but there are other states that have legalized marijuana. How does this compare to Maine or Alaska or California? And what about those states where it's not yet legal, like North Carolina?
  • People '26 and older' are also '18 and older' The reported age categories overlap. '18 and older' includes '18 to 25' and '26 and older'. I believe that the editors added this combined category by aggregating the other two. Why did they do that?
  • Isn't '26 and older' a wide category? '12 to 17' is a 6-year span, and '18 to 25' is an 8-year span. But '26 and older' covers what? 60-plus years?
  • "Coloradoans?" Is that really how people from Colorado refer to themselves? Turns out that's a matter of style preference.

The vagaries of survey results

To its credit, the infographic cites the source for the original data: the National Survey on Drug Use and Health (NSDUH). The organization that conducts the annual survey is the Substance Abuse and Mental Health Services Administration (SAMHSA), which is under the US Department of Health and Human Services. From the survey description: The data provides estimates of substance use and mental illness at the national, state, and sub-state levels. NSDUH data also help to identify the extent of substance use and mental illness among different sub-groups, estimate trends over time, and determine the need for treatment services.

This provides some insight into the purpose of the survey: to help policy makers plan for mental health and substance abuse services. "How many more people are using marijuana for fun?" -- the question I've inferred from the infographic choices -- is perhaps tangential to that charter.

Due to privacy concerns, SAMHSA does not provide the raw survey responses for us analyze. The survey collects details about the respondent's drug use and mental health treatment, as well as demographic information about gender, age, income level, education level, and place of residence. For a deep dive into the questions and survey flow, you can review the 2019 questionnaire here. SAMHSA uses the survey responses to extrapolate to the overall population, producing weighted counts for each response across recoded categories, and imputing counts and percentages for each aspect of substance use (which drugs, how often, how recent).

SAMHSA provides these survey data in two channels: the Public-use Data Analysis System and the Restricted-use Data Analysis System. The "Public-use" data provides annualized statistics about the substance use, mental health, and demographics responses across the entire country. If you want data that includes locale information (such as the US state of residence), then you have to settle for the "Restricted-use" system -- which does not provide annual data, but instead provides data summarized across multi-year study periods. In short, if you want more detail about one aspect of the survey responses, you must sacrifice detail across other facets of the data.

My version of the infographic

I spent hours reviewing the available survey reports and data, and here's what I learned: I am an amateur when it comes to understanding health survey reports. However, I believe that I successfully reverse-engineered the USA Today Snapshot data source so that I could produce my own version of the chart. I used the "Restricted-use" version of the survey reports, which allowed access to imputed data values across two-year study periods. My version shows the same data points, but with these formatting changes:

  • I set the Y axis range as 0% to 100%, which provides a less-exaggerated slope of the trend lines.
  • I did not compute the "18 and over" data point.
  • I added reference lines (dashed blue) to indicate the end of each two-year study period for which I have data points.

Here's one additional data point that's not in the survey or in the USA Today graphic. Colorado legalized marijuana for recreational use in 2012. In my chart, you can see that marijuana use was on the rise (especially among 18-25 years old) well before that, especially since 2009. Medical use was already permitted then (see Robert Allison's chart of the timeline), and we can presume that Coloradoans (!) were warming up to the idea of recreational use before the law was passed. But the health survey measures only reported use, and does not measure the user's purpose (recreational, medical, or otherwise) or attitudes toward the substance.

Limitations of the survey data and this chart

Like the USA Today version, my graph has some limitations.

  • My chart shows the same three broad age categories as the original. These are recoded age values from the study data. For some of the studies it was possible to get more granular age categories (5 or 6 bins instead of 3), but I could not get this for all years. Again, when you push for more detail on one aspect, the "Restricted-use" version of the data pushes back.
  • The "used in the past 12 months" indicators is computed. The survey report doesn't offer this as a binary value. Instead it offers "Used in the past 30 days" and "Used more than 30 days ago but less than 12 months." So, I added those columns together, and I assume that the USA Today editors did the same.
  • I'm not showing the confidence intervals for the imputed survey responses. Since this is survey data, the data values are not absolute but instead are estimates accompanied by a percent-confidence that the true values fall in this certain range. The editors probably decided that this is too complex to convey in your standard USA Today Snapshot -- and it might blunt the potential drama of the graphic. Here's what it would look like for the "Used marijuana in past 30 days" response, with the colored band indicating the 95% confidence interval.

Beyond Colorado: what about other states?

Having done the work to fetch the survey data for Colorado, it was simple to gather and plot the same data for other states. Here's the same graph with data from North Carolina (where marijuana use is illegal) and Maine and California.

While I was limited to the two-year study reports for data at the state level, I was able to get the corresponding data points for every year for the country as a whole:

I noticed that the reported use among those 12-17 years old declined slightly across most states, as well as across the entire country. I don't know what the logistics are for administering such a comprehensive survey to young people, but this made me wonder if something about the survey process had changed over time.

The survey data also provides results for other drugs, like alcohol, tobacco, cocaine, and more. Alcohol has been legal for much longer and is certainly widely used. Here are the results for Alcohol use (imputed 12 months recency) in Colorado. Again I see a decline in the self-reported use among those 12-17 years old. Are fewer young people using alcohol? If true, we don't usually hear about that. Or has something changed in the survey methods with regard to minors?

SAS programs to access NSDUH survey data

On my GitHub repo, you can find my SAS programs to fetch and chart the NSDUH data. The website offers a point-and-click method to select your dimensions: a row, column, and control variable (like a BY group).

I used the interactive report tool to navigate to the data I wanted. After some experimentation, I settled on the "Imputed Marijuana Use Recency" (IRMJRC) value for the report column -- I think that's what USA Today used. Also, I found other public reports that referenced it for similar purposes. The report tool generates a crosstab report and an optional chart, but it also then offers a download option for the CSV version of the data.

I was able to capture that download directive as a URL, and then used PROC HTTP to download the data for each study period. This made it possible to write SAS code to automate the process -- much less tedious than clicking through reports for each study year.

%macro fetchStudy(state=,year=);
  filename study "&workloc./&state._&year..csv";
 
  proc http
   method="GET"
   url="https://rdas.samhsa.gov/api/surveys/NSDUH-&year.-RD02YR/crosstab.csv/?" ||
       "row=CATAG2%str(&)column=IRMJRC%str(&)control=STNAME%str(&)weight=DASWT_1" ||
        "%str(&)run_chisq=false%str(&)filter=STNAME%nrstr(%3D)&state."
   out=study;
  run;
%mend;
 
%let state=COLORADO;
/* Download data for each 2-year study period */
%fetchStudy(state=&state., year=2016-2017);
%fetchStudy(state=&state., year=2015-2016);
%fetchStudy(state=&state., year=2014-2015);
%fetchStudy(state=&state., year=2012-2013);
%fetchStudy(state=&state., year=2010-2011);
%fetchStudy(state=&state., year=2008-2009);
%fetchStudy(state=&state., year=2006-2007);

Each data file represents one two-year study period. To combine these into a single SAS data set, I use the INFILE-with-a-wildcard technique that I've shared here.

 INFILE "&workloc./&state._*.csv"
    filename=fname
    LRECL=32767 FIRSTOBS=2 ENCODING="UTF-8" DLM='2c'x
    MISSOVER DSD;
  INPUT
    state_name   
    recency 
    age_cat 
    /* and so on */

The complete programs are in GitHub -- one version for the state-level two-year study data, and one version for the annual data for the entire country. These programs should work as-is within SAS Enterprise Guide or SAS Studio, including in SAS University Edition. Grab the code and change the STATE macro variable to find the results for your favorite US state.

Conclusion: maintain healthy skepticism

News articles and editorial pieces often use simplified statistics to convey a message or support an argument. There is just something about including numbers that lends credibility to reporting and arguments. Citing statistics is a time-honored and effective method to inform the public and persuade an audience. Responsible journalists will always cite their data sources, so that those with time and interest can fact-check and find additional context beyond what the media might share.

I enjoy features like the USA Today Snapshot, even when they send me down a rabbit hole as this one did. As I tell my children often (and they are weary of hearing it), statistics in the media should not be accepted at face value. But if they make you curious about a topic so that you want to learn more, then I think the editors should be proud of a job well done. It's on the rest of us to follow through to find the deeper answers.

The post A skeptic's guide to statistics in the media appeared first on The SAS Dummy.

8月 192019
 

In parts one and two of this blog series, we introduced the automation of AI (i.e., artificial intelligence) and natural language explanations applied to segmentation and marketing. Following this, we began marching down the path of practitioner-oriented examples, making the case for why we need it and where it applies. [...]

SAS Customer Intelligence 360: Automated AI and segmentation [Part 3] was published on Customer Intelligence Blog.

8月 192019
 

One of my friends likes to remind me that "there is no such thing as a free lunch," which he abbreviates by "TINSTAAFL" (or TANSTAAFL). The TINSTAAFL principle applies to computer programming because you often end up paying a cost (in performance) when you call a convenience function that simplifies your program.

I was thinking about TINSTAAFL recently when I was calling a Base SAS function from the SAS/IML matrix language. The SAS/IML language supports hundreds of built-in functions that operate on vectors and matrices. However, you can also call hundreds of functions in Base SAS and pass in vectors for the parameters. It is awesome and convenient to be able to call the virtual smorgasbord of functions in Base SAS, such as probability function, string matching functions, trig function, financial functions, and more. Of course, there is no such thing as a free lunch, so I wondered about the overhead costs associated with calling a Base SAS function from SAS/IML. Base SAS functions typically are designed to operate on scalar values, so the IML language has to call the underlying function many times, once for each value of the parameter vector. It is more expensive to call a function a million times (each time passing in a scalar parameter) than it is to call a function one time and pass in a vector that contains a million parameters.

To determine the overhead costs, I decided to test the cost of calling the MISSING function in Base SAS. The IML language has a built-in syntax (b = (X=.)) for creating a binary variable that indicates which elements of a vector are missing. The call to the MISSING function (b = missing(X)) is equivalent, but requires calling a Base SAS many times, once for each element of x. The native SAS/IML syntax will be faster than calling a Base SAS function (TINSTAAFL!), but how much faster?

The following program incorporates many of my tips for measuring the performance of a SAS computation. The test is run on large vectors of various sizes. Each computation (which is very fast, even on large vectors) is repeated 50 times. The results are presented in a graph. The following program measures the performance for a character vector that contains all missing values.

/* Compare performance of IML syntax
   b = (X = " ");
   to performance of calling Base SAS MISSING function 
   b = missing(X);
*/
proc iml;
numRep = 50;                            /* repeat each computation 50 times */
sizes = {1E4, 2.5E4, 5E4, 10E4, 20E4};  /* number of elements in vector */
labl = {"Size" "T_IML" "T_Missing"};
Results = j(nrow(sizes), 3);
Results[,1] = sizes;
 
/* measure performance for character data */
do i = 1 to nrow(sizes);
   A = j(sizes[i], 1, " ");            /* every element is missing */
   t0 = time();
   do k = 1 to numRep;
      b = (A = " ");                   /* use built-in IML syntax */
   end;
   Results[i, 2] = (time() - t0) / numRep;
 
   t0 = time();
   do k = 1 to numRep;
      b = missing(A);                  /* call Base SAS function */
   end;
   Results[i, 3] = (time() - t0) / numRep;
end;
 
title "Timing Results for (X=' ') vs missing(X) in SAS/IML";
title2 "Character Data";
long = (sizes // sizes) || (Results[,2] // Results[,3]);   /* convert from wide to long for graphing */
Group = j(nrow(sizes), 1, "T_IML") // j(nrow(sizes), 1, "T_Missing"); 
call series(long[,1], long[,2]) group=Group grid={x y} label={"Size" "Time (s)"} 
            option="markers curvelabel" other="format X comma8.;";

The graph shows that the absolute times for creating a binary indicator variable is very fast for both methods. Even for 200,000 observations, creating a binary indicator variable takes less than five milliseconds. However, on a relative scale, the built-in SAS/IML syntax is more than twice as fast as calling the Base SAS MISSING function.

You can run a similar test for numeric values. For numeric values, the SAS/IML syntax is about 10-20 times faster than the call to the MISSING function, but, again, the absolute times are less than five milliseconds.

So, what's the cost of calling a Base SAS function from SAS/IML? It's not free, but it's very cheap in absolute terms! Of course, the cost depends on the number of elements that you are sending to the Base SAS function. However, in general, there is hardly any cost associated with calling a Base SAS function from SAS/IML. So enjoy the lunch buffet! Not only is it convenient and plentiful, but it's also very cheap!

The post Timing performance in SAS/IML: Built-in functions versus Base SAS functions appeared first on The DO Loop.

8月 192019
 

As you will have read in my last blog, businesses are demanding better outcomes, and through IoT initiatives big data is only getting bigger. This presents a clear opportunity for organisations to start thinking seriously about how to leverage analytics with their other investments. Demands on supply chains have also [...]

Can the artificial intelligence of things make the supply chain intelligent? was published on SAS Voices by Tim Clark

8月 182019
 

Have you ever thought of selling sand on the beach? Neither have I. To most people the mere idea is preposterous. But isn’t it how all great discoveries and inventions are made? Someone comes up with an outwardly crazy, outlandish idea, and despite all the skepticism, criticism, ostracism, ridicule and [...]

Selling sand at the beach was published on SAS Voices by Leonid Batkhan

8月 162019
 

The Output Delivery System (ODS) Graphics procedures provide many options to give you control over the look of your output. However, there are times when your output does not look like you thought it would.

This blog discusses how to solve some common output-related problems that we hear about in Technical Support.

All of the examples in this blog relate to creating scatter plots and bar charts from the same data set, SASHELP.CLASS. This data set, included in your SAS® installation, provides information about heights and ages for both male and female students.

Colors in the output are not as desired

Using the STYLEATTRS statement

The STYLEATTRS statement enables you to define attributes, such as color, for graphical elements.

In the following SGPLOT procedure example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink:

proc sgplot data=sashelp.class;
   styleattrs datacolors=(blue pink);
   scatter x=age y=height / group=sex
   markerattrs=(symbol=circlefilled);
run;

However, after you submit the code, the resulting plot does not use the specified colors. Instead, you see blue and red:

When defining colors for graphical elements, the DATACOLORS= option defines the colors for filled areas, and the DATACONTRASTCOLORS= option defines the colors for marker symbols and lines.

Because the scatter plot is creating marker symbols, you have to change the STYLEATTRS statement to use the DATACONTRASTCOLORS= option instead of the DATACOLORS= option. Here is the revised code:

proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink);
   scatter x=age y=height / group=sex 
   markerattrs=(symbol=circlefilled);
run;

Now, when you submit the updated code, the correct colors appear:

You can find more information about the STYLEATTRS statement in the STYLEATTRS section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Using an attribute map

An attribute map enables you to associate specific values for your plot GROUP= variable with specific graphical attributes.

The attribute map is defined in a data set that includes the following:

  • an ID variable that contains the name of the attribute map definition
  • a VALUE variable that contains the value of the plot statement GROUP= variable
  • any other variables for the attributes that you want to define

In the following example, the attribute map BARCOLORS is defined to associate the group value F with the color pink and the group value M with the color blue. Note that the FILLCOLOR variable is also used to define the colors for the bars of the bar chart.

data attrmap;
   id='barcolors';
      input value $ fillcolor $;
      datalines;
F pink
M blue
;
run;
proc format;
   value $ genderfmt
      'F'='Female'
      'M'='Male';
run;
proc sgplot data=sashelp.class dattrmap=attrmap;
   vbar age / response=height group=sex groupdisplay=cluster 
   nooutline attrid=barcolors;
   format sex $genderfmt.;
run;

However, the output shows blue and red bars, instead of using the pink and blue values that you specified:

In this case, a format is defined to display the group values as Female and Male. The attribute map associates the group values of F and M with the pink and blue bar colors that you want, so the values do not match.

You need to change the attribute map VALUE variable so that it contains the formatted value of the GROUP= variable. Here is the first part of the code again, highlighted to show where it has changed:

With the updated values, the output now displays the correct colors:

You can find more information about attribute maps in the SG Attribute Maps section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Symbols in the output are not as desired

In the following PROC SGPLOT example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink. Also, the marker symbols should be either filled circles or filled squares:

ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

You submit the code using the STYLES.HTMLBLUE style. The output shows all the symbols as circles, and none are squares:

The ATTRPRIORITY ODS Graphics option determines how attributes are cycled. The default value for the ATTRPRIORITY option is defined in the style that is being used.

The STYLES.HTMLBLUE style sets the default value COLOR for the ATTRPRIORITY option. This COLOR value cycles the symbols through your specified colors before the second symbol is generated.

You want to set the ATTRPRIORITY ODS Graphics option to NONE in an ODS GRAPHICS statement. That ODS GRAPHICS statement then prevents the symbols from cycling through the colors list:

ods graphics /attrpriority=none;
ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

In the updated output, squares are now seen in the correct color, pink:

You can read more about how attributes are cycled in the following blog post by Rick Wicklin:
Attrs, attrs, everywhere: The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics

Annotation is not placed in the output as desired

Adding an oval

In this example, you want the resulting scatter plot to contain an oval around the circle that represents the tallest student:

proc sql;
   create table maxheight as
   select height, age
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x=age;
       y=height;
       width=6;
       height=6;
       linecolor='red';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

After you submit the code, you notice that the resulting plot does not include the oval:

If you have used annotation in SAS/GRAPH® software, you might be accustomed to using the X and Y variables in the annotation data set to indicate the location of the annotation. However, ODS Statistical Graphics (SG) annotation uses X1 and Y1 variables for the location of the annotation.

Therefore, you need to change the X and Y variables in the annotation data set to X1 and Y1 instead:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x1=age;
       y1=height;
       width=6;
       height=6;
       linecolor='red';
run;

In the next version of the scatter plot, the oval now appears:

Adding a text label

In this example, you want to place a text label next to the circle that represents the tallest student.

proc sql;
   create table maxheight as
   select height, age, name
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='label';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

However, after you run this code, the output does not include the text label:

Again, if you are a SAS/GRAPH user, you might assume that the LABEL function can place text on a plot. However, ODS SG annotation needs to use the TEXT function instead.

In the previous DATA step, you need to change the FUNCTION variable so that it contains the value TEXT:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='text';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;

After you revise the DATA step and resubmit your code, you then see that the text label appears where intended:

You can find more information about SG annotation in the SG Annotation section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Summary

These are just a few examples to demonstrate some of the common output-related problems that we hear about in Technical Support. If your graphical output does not appear as you wanted, consider the options that you are using and make sure that you are using the correct option.

Learn More

How to fix common problems in output from ODS Graphics procedures was published on SAS Users.