9月 132019
 

Time-series decomposition is an important technique for time series analysis, especially for seasonal adjustment and trend strength measurement. Decomposition deconstructs a time series into several components, with each representing a certain pattern or characteristic. This post shows you how to use SAS® Visual Analytics to visually show the decomposition of a time series so that you can understand more about its underlying patterns.

Characteristics of time series decomposition

Time series decomposition generally splits a time series into three components: 1) a trend-cycle, which can be further decomposed into trend and cycle components; 2) seasonal; and 3) residual, in an additive or multiplicative fashion.

In additive decomposition, the cyclical, seasonal, and residual components are absolute deviations from the trend component, and they do not depend on trend level. In multiplicative decomposition, the cyclical, seasonal and residual components are relative deviations from the trend. Thus, we often see different magnitudes of seasonal, cyclical and residual components when comparing with the trend component, while the trend component keeps the same scale as the original series.

How to begin a time series decomposition

SAS provides several procedures for time series decomposition, I use the PROC Timeseries in this post. Now the first step is to decide whether to use additive or multiplicative decomposition. You know SAS PROC Timeseries provides multiplicative (MODE=MULT), additive (MODE=ADD), pseudo-additive (MODE=PSEUDOADD) and log-additive (MODE=LOGADD) decomposition. You can also use the default MODE option of MULTORADD to let SAS help you make a decision based on the feature of your data. Good thing is, you can always use the log transformation whenever there is a need to change a multiplicative relationship to an additive relationship. The plot option in PROC Timeseries can produce graphs of the generated trend-cycle component, seasonal component and residual component. In this post, I would like to output the OUTDECOMP dataset from PROC Timeseries, load the data and visualize the decomposed time series with SAS Visual Analytics to understand more about their patterns.

See how it's done

I decompose the time series in the SASHELP.AIR dataset as an example. The series involves data about international air travel with monthly data points from Jan 1949 to Jan 1961, as pictured below:

We see an obvious upward trend and significant seasonality in the original series, with more and more intensive fluctuation around the trend. This indicates that the multiplicative decomposition of trend and seasonality components is more appropriate. I get the decomposed components using this SAS code. Here I do not explicitly give the mode option first, and let SAS use the default MODE=MULTORADD option. Since the values in this time series are strictly positive, SAS eventually specifies the MODE=MULT to generate the decomposed series in the OUTDECOMP dataset (see details in the document).

When you load the data set into SAS Visual Analytics and make visualizations, it’s very straight forward to draw a time-series plot showing the decomposed series, respectively.

Note that the magnitudes of the Trend-Cycle-Seasonal and Trend-Cycle components are much larger than those of the Seasonal, Irregular and Cycle components. The upward trend and increasing volatility of the Trend-Cycle-Seasonal component reveal an obvious multiplicative composition of Trend-Cycle and Seasonal components. The formula should be: Trend-Cycle-Seasonal Component = Trend-Cycle Component * Seasonal Component.

Can you visually show the multiplicative relationship in the series?

I can easily make the log transformation of the decomposition series using the calculated item in SAS Visual Analytics, and accordingly show the additive relationship of the transformed series. The visualization below shows the additive relationship of the log transformation of the Trend-Cycle-Seasonal component with the log transformations of Trend-Cycle component and Seasonal component, which is the equivalent of the pre-transformed multiplicative relationship.

In the visualization below, the moss-green line series at the bottom of the chart shows the Log Seasonal component, with each vertical black line representing its value. The lines at the top show that the value of the orange line series (the Log Trend-Cycle component) adds to the value that the mint-green vertical lines (value of the Log Seasonal component) will make to the pine-green line series (the Log Trend-Cycle-Seasonal component).

In the list table, note that the value of the calculated item 'Trend-Cycle Component * Seasonal Component’ is equal to the 'Trend-Cycle-Seasonal Component' value highlighted in blue, which indicates the multiplicative composition of 'Trend-Cycle Component' and 'Seasonal Component' to the 'Trend-Cycle-Seasonal Component.' Also, summation of the calculated item 'Log Trend-Cycle Component' and the 'Log Seasonal Component' is equal to the value of 'Log Trend-Cycle-Seasonal Component' in light green. They verify the multiplicative and additive relationships, respectively.

More ways to expose and view patterns

Besides the above multiplicative decomposition, we can dig for more multiplicative or additive relationships from the original series and the decomposed series. Here are the formulas:

Original Series = Trend-Cycle-Seasonal Component * Irregular Component

Seasonal-Irregular Component = Seasonal Component * Irregular Component

Original Series = Seasonal Adjusted Series * Seasonal Component

Trend-Cycle Component = Trend Component + Cycle Component 1

[ 1 Note: Despite setting the MODE=MULT option, SAS Proc Timeseries uses the Hodrick-Prescott filter, which always decomposes the trend-cycle component into the trend component and cycle component in an additive fashion. ]

Considering the decomposed dataset from various time series will have the fixed structure as shown below, we can easily apply the visualizations in SAS Visual Analytics to the decomposed series from different time series. Just applying the new dataset, all the calculated items will be inherited accordingly, and the new data will be applied to the visualizations automatically. That’s the thing I like most for visualizing time series decomposition in SAS Visual Analytics.

A final decomposition comparison

Let’s compare the multiplicative decomposition and the additive decomposition of the same series. Note the Trend-Cycle components (as well as Trend component and Cycle component) from multiplicative and additive decomposition are the same, meaning that the seasonal component is decomposed differently in multiplicative and additive decomposition.

In the screenshot below, we see that the two seasonal components have similar seasonal fluctuation style, but the value of seasonal components are largely different between multiplicative and additive decomposition. Different decomposition method also leads to different Trend-Cycle-Seasonal component, Irregular component and Seasonal-Irregular component. In addition, we see still some patterns there in the Irregular component from additive decomposition.

But in multiplicative decomposition, the Irregular component seems more random-like. Thus, the multiplicative decomposition is a better choice than additive decomposition for SASHELP.AIR time series.

PROC Timeseries provides classical decomposition of time series, and SAS has other procedures that can perform more complex decomposition of time series. If you want to visualize time series decomposition in a way you like, give SAS Visual Analytics a try!

SAS® Visual Analytics on SAS® Viya® Try it for free!

How to Visualize Time Series Decomposition using SAS Visual Analytics was published on SAS Users.

9月 112019
 

Portion of Figure 3 by Bull et al. (Nature, 2019)

I often use axis tables in PROC SGPLOT in SAS to add a table of text to a graph so that the table values are aligned with the data. But axis tables are not the only way to display tabular data in a graph. You can also use the TEXT statement, which supports many features that are not available in axis tables, such as rotating the text. Recently I saw some graphs by Bull, et al. (Fig 3, Nature, 2019) in which a table was presented in an interesting way. Part of the graph is reproduced at the right.

The complete graph includes ages from 18 to 45 years old, so there are many horizontal categories and they are very close together. For each age, the graph shows the mean luteal-phase length for women in a study about menstrual cycles. (The luteal-phase length is the number of days between ovulation and menstruation.) The numbers indicate the number of women (first number) and the number of menstrual cycles (second number) from which the mean is calculated. The vertical bars are a 95% confidence interval (CI) for the mean.

This article uses the SGPLOT procedure in SAS to create three graphs that are similar to this one:

  1. When the numbers in a table are small (a few digits), you can use the standard X axis table to show the numbers. However, an X axis table isn't effective for this particular graph because the numbers have too many digits to fit into the available horizontal space.
  2. One alternative (used by Bull, et al.) is to rotate the text.
  3. A second alternative is to rotate the entire plot and use a Y axis table to present the data table.

Use an XAXISTABLE statement

The authors did not make the data publicly available, so I estimated values from the chart to create some similar-looking data. I did not want to type in the means and CIs for all 28 age groups, so I stopped at Age=26.

/* Data based on Fig 3 of
   Bull, et al. (2019)
   "Real-world menstrual cycle characteristics of more than 600,000 menstrual cycles"
   npj Digital Medicine
   https://www.nature.com/articles/s41746-019-0152-7
*/
data menstrual;
input Age mean low high Users Cycles;
label mean = "Luteal length";
datalines;
18 12.1  11.3  12.8    46    123
19 12.07 11.75 12.3   354   1082
20 12.2  11.97 12.33  811   2547
21 12.18 12.0  12.25 1535   4925
22 12.27 12.15 12.4  2425   8786
23 12.25 12.2  12.3  3527  13579
24 12.28 12.22 12.39 4693  19749
25 12.3  12.24 12.39 5966  27000
26 12.33 12.3  12.42 7718  35845
;
 
ods graphics / width=270px height=480px;  /* make sure there isn't much room between age groups */
 
/* first attempt: Use XAXISTABLE to position text that shows Users and Cycles for each age */
title "Show Table of Counts";
title2 "XAXISTABLE Statement";
proc sgplot data=menstrual noautolegend;
   scatter x=Age y=mean / yerrorlower=low yerrorupper=high errorbarattrs=GraphData1;
   xaxistable Users Cycles / x=Age location=inside;  /* can use POSITION=TOP */
   xaxis grid min=18 offsetmin=0.1 offsetmax=0.1;
   yaxis grid;
run;

For graphs like this, the XAXISTABLE statement should usually be the first statement you try because it is so simple to use. In the XAXISTABLE statement, you list each variable that you want to display (Users and Cycles) and specify the variable for the horizontal positions (Age). The result is shown above. You can see that for these data, the cells in the axis table overlap each other and are unreadable because the distance between groups is so small.

Clearly, this first attempt does not work for this table. And because this table is intended to fit on one slide or piece of paper, you cannot simply make the graph wider to prevent the overlap. Instead, an alternative approach is required.

Use rotated text

The authors chose to display the Users and Cycles data above each age group by using rotated text. To do this in SAS, you need to add two new variables to the data set. The first is a character variable that contains the comma-separated values of Users and Cycles. The second is the location for the text, in data coordinates. The following DATA step uses the CATX function to concatenate the data values into a comma-separated string. The height of the text string is set to 13 for all groups in this example, but you could make the height depend on the value of the mean if you prefer.

/* second attempt: Use TEXT and ROTATE=90 to position text */
data menstrual2;
set menstrual;
length Labl $20;
Labl = catx(", ", Users, Cycles); /* concatenate: "123, 4567" */
Height = 13;               /* this variable can depend on Age */
run;
 
title2 "TEXT Statement, ROTATE=90";
proc sgplot data=menstrual2 noautolegend;
   scatter x=Age y=mean / yerrorlower=low yerrorupper=high errorbarattrs=GraphData1;
   text x=Age y=Height text=Labl / position=right rotate=90 
              backfill fillattrs=(color=white);
   yaxis grid offsetmin=0.1;
   xaxis grid min=18 offsetmin=0.1 offsetmax=0.1;
run;

When you use the ROTATE= option to display rotated text, it is important to understand how the text is positioned relative to the coordinates that you specify. The coordinates (in this case, (Age, Height)) determine an anchor point. The POSITION= option specifies how the text is positioned relative to the anchor point BEFORE the rotation. Therefore, the combination POSITION=RIGHT ROTATE=90 results in text that is positioned above the anchor point.

Use a YAXISTABLE statement

There is another option, which is to rotate the entire graph. Rather than specify Age as the horizontal variable and using vertical bars for the CIs, you can specify Age as the vertical variable and use horizontal bars. This results in a graphic that will be long rather than wide. There should be enough horizontal space to include two columns of text that show the data for Users and Cycles. Because a printed page (in portrait mode) is longer than it is wide, the graph will probably fit on a standard sheet of paper. However, it might not fit on a slide, which is wider than it is tall.

The following call to PROC SGPLOT creates a rotated version of the graph. In addition to rotating the graph, I add alternating bands of gray so that the reader can more easily associate intervals with age groups.
/* third attempt: Rotate plot, use YAXISTABLE, add alternating bands */
ods graphics / width=480px height=300px;
title2 "YAXISTABLE Statement";
%macro HalfWidth(nCat);
   %sysevalf(0.5/&nCat)
%mend;
 
proc sgplot data=menstrual noautolegend;
   scatter y=Age x=mean / xerrorlower=low xerrorupper=high errorbarattrs=GraphData1;
   yaxistable Users Cycles / y=Age location=inside position=left valueattrs=(size=9);  
   yaxis grid reverse type=discrete discreteorder=data fitpolicy=none 
      offsetmin=%HalfWidth(9) offsetmax=%HalfWidth(9) /* half of 1/k, where k=number of categories */
      colorbands=even colorbandsattrs=(color=gray transparency=0.9);
   xaxis grid;
run;

By rotating the graph, the table of numbers is easier to read. The graph for the full data will be somewhat long, but that is not usually a problem for the printed page or for HTML. The main drawback is that long graphs might not fit on a slide for a presentation. A second drawback is that the authors wanted to show that the luteal-phase length depends on age, and it is traditional to plot independent variables (age) horizontally and dependent variables (luteal length) vertically.

In summary, this article shows three ways to add tabular data to a scatter plot with error bars. The first way is to use the XAXISTABLE statement, which works when the table entries are not too wide relative to the horizontal spacing between groups. The second way is to rotate the text, as done in the Nature article. The third way is to rotate the plot so that the error bars are shown horizontally rather than vertically. This third presentation is further enhanced by adding alternating bands of color to help the reader distinguish the age categories. (You can use alternating color bands for the XAXISTABLE, too.)

All three methods are useful in various circumstances, so remember to consider all three methods when you design graphs like this.

To learn more about using horizontal and vertical axis tables in SAS, see Chapter 3 of Warren Kuhfeld's free e-book Advanced ODS Graphics Examples.

The post Axis tables versus rotated text: How to display a wide table in a small graph appeared first on The DO Loop.

9月 102019
 

I recently had the incredible opportunity to attend SAS Global Forum in Dallas as a presenter and New SAS Professional Award recipient. At the conference, I was able to learn more about SAS features and applications, share my knowledge of SAS applications in the clinical trials space, and make new professional connections.

Here are 11 reasons why you should consider applying for this award, too.

1) Free registration & conference hotel: The obvious perk for award winners is the waived fees associated with the cost of attending the conference, including the registration fee, pre-conference tutorial, and free stay at the conference hotel for award winners who are also presenters. As a junior-level employee, it can be difficult to convince your department to allow you to travel to a conference, but it makes it a lot easier to pitch the idea when an award covers most of the costs.

2) See a new city: I arrived at the conference a day early, so I was able to take advantage of my time in Dallas to see the city. I walked around downtown, toured the Dallas Arboretum and Botanical Garden, and ate some delicious barbeque. SAS Global Forum 2020 will be held in Washington D.C., so there will be plenty of sights to see there as well.

3) Receive guidance from a mentor: Award recipients who publish and present a paper are eligible to be matched with a mentor through the Presenter Mentoring Program. My mentor, Chris Battiston, was incredibly friendly, helpful, and personable. He provided advice on presentations to attend, public speaking tips, and even referred me for an opportunity to fly out to Canada as an invited speaker at the SAS Canada Health Users group conference. Having a mentor helped set my expectations for the conference and make a plan to maximize my experience.

4) Open doors to additional opportunities: This award, and my associated presentations, provided me with a huge boost in my credibility and the publicity around my work. As a direct result of presenting at this conference and receiving the award, I received invitations to speak on the main stage in front of 5,000+ people at SAS Global Forum 2019, to attend the SAS Canada Health Users Group as an invited speaker, to serve as a panel speaker at the Research Triangle SAS Users Group, and attend SAS Global Forum 2020 as an invited speaker. I also had opportunities to meet Jim Goodnight and other SAS executives, which was an incredible honor.

5) Speak with SAS employees: Have a question about a SAS procedure? At SAS Global Forum, you can ask your question to the actual developers of those procedures in The Quad. The Quad is a large exhibit and demo area with dozens of SAS booths as well as the conference sponsors. At the booths, I spoke to quite a few representatives from SAS and learned about the variety of areas where SAS is making an impact. I learned about the features and functions of SAS Viya, efforts at SAS to make data visualization accessible to those who are visually impaired, the rationale behind moving the certification exams to a performance-based format, and the free SAS-supported software platform to teach coding to children at a young age.

6) Free swag: Not the most important reason, but still an awesome bonus! I walked away from the conference with two free t-shirts, a backpack from the Pinnacle Solutions sponsor booth, and many trinkets, pens, and notepads collected from the various booths.

7) Have fun: There were quite a few events at the conference that were a lot of fun! It was easy to meet people because everyone at the event was so friendly. There were happy hour events, lunch networking groups where you could sit with a table of people based on common interests, escape rooms, get-togethers for SAS regional user groups, and a big party for all conference attendees on the last night. It is a great opportunity to spend time with the people you meet at the conference.

8) Practice public speaking skills & teach others: Presenting at the conference is a great opportunity to practice speaking in front of a large group and to teach other professionals about some aspect of SAS. As a "New" SAS professional, it may sound daunting to come up with a topic that would be useful for a more experienced audience, but you'd be surprised at the number of people who attend the conference with no knowledge of many of the base procedures. Additionally, conference attendees find it incredibly valuable to learn about how SAS can be used to solve a problem or how an existing common task can be programmed more efficiently. My topic was "Using PROC SQL to Generate Shift Tables More Efficiently", and it taught programmers and statisticians a shorter way to produce shift tables, which are commonly used to present categorical longitudinal data. Because of the preparation I put in to present at the conference, I left the event as a much more confident speaker than I had ever been before.

9) Learn something new: At the conference, you'll have the opportunity to attend sessions on virtually any topic you can think of that is related to SAS. Most of the talks I attended were related to statistics because the topic aligns with my job description as a Biostatistician. Some of the topics I learned about were Bayesian analysis, missing data, survival analysis, clinical graphs, and artificial intelligence. Additionally, the conference allows you opportunities to ask specific questions about any SAS procedure or task you’re struggling with. A resource available at the conference is the “Code Doctors” table in The Quad, where you can ask programming questions to SAS experts. I had the opportunity to serve as a “Resident” for the Code Doctors program and was able to observe and help those who needed advice.


10) Increase visibility within your company:
I was the only attendee from my company out of those working in my office, but there were several senior-level IQVIA employees from other regions in attendance, and I had the opportunity to meet them and spend time with them at the conference. I work at a very large company and would not have had the opportunity to meet these coworkers otherwise, so it was an excellent opportunity to increase my visibility even within my company. Additionally, I’ve had opportunities to apply the knowledge I gained from the conference at work and to share advice with coworkers based on what I learned.

11) Make new connections: Perhaps the most important reason to attend SAS Global Forum as a New SAS Professional is the connections you make at the conference. There are opportunities to meet people from all stages in their career who use SAS to complete statistical analysis. Despite working in different industries, I found that many conference attendees used the same procedures and dealt with the same issues that I did, and I truly felt a sense of community among the long-time attendees. Like most of the programmers, analysts, and statisticians in attendance, my day-to-day work is in a solitary environment on the computer. Although teamwork is involved within project teams, there is not a great amount of face-to-face interactions. I love connecting with other people, and this conference gave me the opportunity to meet other people working in similar positions.

The New SAS Professional Award is perfect for those with the potential to become a leader in their field and who are looking for more opportunities to present their ideas, to network and make connections, and to learn from experts.

This experience has allowed me to expand my skills and network, and served as a launchpad for my successful career. My attendance at the conference has allowed me to feel a greater sense of community with other SAS users, and to serve as a representative from the "next generation" of SAS Professionals. I encourage you to submit your abstract by September 30th and your award application by November 1st if this seems like the right opportunity for you. More details about this award and other award and scholarship opportunities are available on the SAS Global Forum 2020 website.

11 Reasons to Apply for the New SAS Professional Award was published on SAS Users.

9月 102019
 

I recently had the incredible opportunity to attend SAS Global Forum in Dallas as a presenter and New SAS Professional Award recipient. At the conference, I was able to learn more about SAS features and applications, share my knowledge of SAS applications in the clinical trials space, and make new professional connections.

Here are 11 reasons why you should consider applying for this award, too.

1) Free registration & conference hotel: The obvious perk for award winners is the waived fees associated with the cost of attending the conference, including the registration fee, pre-conference tutorial, and free stay at the conference hotel for award winners who are also presenters. As a junior-level employee, it can be difficult to convince your department to allow you to travel to a conference, but it makes it a lot easier to pitch the idea when an award covers most of the costs.

2) See a new city: I arrived at the conference a day early, so I was able to take advantage of my time in Dallas to see the city. I walked around downtown, toured the Dallas Arboretum and Botanical Garden, and ate some delicious barbeque. SAS Global Forum 2020 will be held in Washington D.C., so there will be plenty of sights to see there as well.

3) Receive guidance from a mentor: Award recipients who publish and present a paper are eligible to be matched with a mentor through the Presenter Mentoring Program. My mentor, Chris Battiston, was incredibly friendly, helpful, and personable. He provided advice on presentations to attend, public speaking tips, and even referred me for an opportunity to fly out to Canada as an invited speaker at the SAS Canada Health Users group conference. Having a mentor helped set my expectations for the conference and make a plan to maximize my experience.

4) Open doors to additional opportunities: This award, and my associated presentations, provided me with a huge boost in my credibility and the publicity around my work. As a direct result of presenting at this conference and receiving the award, I received invitations to speak on the main stage in front of 5,000+ people at SAS Global Forum 2019, to attend the SAS Canada Health Users Group as an invited speaker, to serve as a panel speaker at the Research Triangle SAS Users Group, and attend SAS Global Forum 2020 as an invited speaker. I also had opportunities to meet Jim Goodnight and other SAS executives, which was an incredible honor.

5) Speak with SAS employees: Have a question about a SAS procedure? At SAS Global Forum, you can ask your question to the actual developers of those procedures in The Quad. The Quad is a large exhibit and demo area with dozens of SAS booths as well as the conference sponsors. At the booths, I spoke to quite a few representatives from SAS and learned about the variety of areas where SAS is making an impact. I learned about the features and functions of SAS Viya, efforts at SAS to make data visualization accessible to those who are visually impaired, the rationale behind moving the certification exams to a performance-based format, and the free SAS-supported software platform to teach coding to children at a young age.

6) Free swag: Not the most important reason, but still an awesome bonus! I walked away from the conference with two free t-shirts, a backpack from the Pinnacle Solutions sponsor booth, and many trinkets, pens, and notepads collected from the various booths.

7) Have fun: There were quite a few events at the conference that were a lot of fun! It was easy to meet people because everyone at the event was so friendly. There were happy hour events, lunch networking groups where you could sit with a table of people based on common interests, escape rooms, get-togethers for SAS regional user groups, and a big party for all conference attendees on the last night. It is a great opportunity to spend time with the people you meet at the conference.

8) Practice public speaking skills & teach others: Presenting at the conference is a great opportunity to practice speaking in front of a large group and to teach other professionals about some aspect of SAS. As a "New" SAS professional, it may sound daunting to come up with a topic that would be useful for a more experienced audience, but you'd be surprised at the number of people who attend the conference with no knowledge of many of the base procedures. Additionally, conference attendees find it incredibly valuable to learn about how SAS can be used to solve a problem or how an existing common task can be programmed more efficiently. My topic was "Using PROC SQL to Generate Shift Tables More Efficiently", and it taught programmers and statisticians a shorter way to produce shift tables, which are commonly used to present categorical longitudinal data. Because of the preparation I put in to present at the conference, I left the event as a much more confident speaker than I had ever been before.

9) Learn something new: At the conference, you'll have the opportunity to attend sessions on virtually any topic you can think of that is related to SAS. Most of the talks I attended were related to statistics because the topic aligns with my job description as a Biostatistician. Some of the topics I learned about were Bayesian analysis, missing data, survival analysis, clinical graphs, and artificial intelligence. Additionally, the conference allows you opportunities to ask specific questions about any SAS procedure or task you’re struggling with. A resource available at the conference is the “Code Doctors” table in The Quad, where you can ask programming questions to SAS experts. I had the opportunity to serve as a “Resident” for the Code Doctors program and was able to observe and help those who needed advice.


10) Increase visibility within your company:
I was the only attendee from my company out of those working in my office, but there were several senior-level IQVIA employees from other regions in attendance, and I had the opportunity to meet them and spend time with them at the conference. I work at a very large company and would not have had the opportunity to meet these coworkers otherwise, so it was an excellent opportunity to increase my visibility even within my company. Additionally, I’ve had opportunities to apply the knowledge I gained from the conference at work and to share advice with coworkers based on what I learned.

11) Make new connections: Perhaps the most important reason to attend SAS Global Forum as a New SAS Professional is the connections you make at the conference. There are opportunities to meet people from all stages in their career who use SAS to complete statistical analysis. Despite working in different industries, I found that many conference attendees used the same procedures and dealt with the same issues that I did, and I truly felt a sense of community among the long-time attendees. Like most of the programmers, analysts, and statisticians in attendance, my day-to-day work is in a solitary environment on the computer. Although teamwork is involved within project teams, there is not a great amount of face-to-face interactions. I love connecting with other people, and this conference gave me the opportunity to meet other people working in similar positions.

The New SAS Professional Award is perfect for those with the potential to become a leader in their field and who are looking for more opportunities to present their ideas, to network and make connections, and to learn from experts.

This experience has allowed me to expand my skills and network, and served as a launchpad for my successful career. My attendance at the conference has allowed me to feel a greater sense of community with other SAS users, and to serve as a representative from the "next generation" of SAS Professionals. I encourage you to submit your abstract by September 30th and your award application by November 1st if this seems like the right opportunity for you. More details about this award and other award and scholarship opportunities are available on the SAS Global Forum 2020 website.

11 Reasons to Apply for the New SAS Professional Award was published on SAS Users.

9月 102019
 

SASPy is a powerful Python library that interfaces with SAS and can help with your machine-learning solutions. SASPy was created for Python programmers to leverage the power of SAS within their Python scripts. If you are not familiar with SASPy, see the following resources:

This blog post shows you how powerful SASPy can be. SASPy helps you with providing visuals and descriptive statistics quickly and accurately. To demonstrate this capability, let’s explore and prepare your data using SASPy.

Prerequisites

To get started, here is what you need:

  • The Census Income data set from the University of California Irvine’s Machine Learning Repository
    • Download the adult.data data set from the data folder.
    • Remove the missing values prior to exploring and preparing.
  • SAS®9.4 or SAS® Viya® 3.1 or any later variations of these
  • Jupyter Notebook
  • SASPy (To install SASPy, refer to the installation and configuration documentation.)

After verifying you have completed the above requirements, you can start your Jupyter Notebook and begin coding using SASPy.

Let's start by importing libraries we will use in this example

  1. Import the libraries:
  2. Start your SAS session. Use the command below to establish a connection.

A "SAS Connection established" message returns once connected. This example uses a local connection to SAS. However, you can use an STDIO connection or an IOM connection to SAS if you prefer. For more information, see SAS Configuration.

  1. Read in your data set. You have two options: You can either read in the data set using pandas and then read the data into a SAS data object or you can read it directly into a SAS data object. This example shows reading the data directly into a SAS data object.

To access existing data in a SAS session, use the SAS data object. A SAS data object can be used to do the following:

  • Create various graphs such as histograms, scatter plots, heatmaps, and so on.
  • Display descriptive statistics.
  • Transfer data in between a pandas data frame and a SAS data object.

The SAS data object is versatile. To view all of its capabilities, refer to the SAS Data Object documentation.

  1. Verify whether you successfully read in your data set:

Similar to pandas, SASPy has a head function to display data points. The only difference is when you are specifying how many data points you would like to see. You need to include “obs=n” if you are using a SAS data object.

Exploring your Data

SASPy provides many options to explore your data. This example uses a combination of SASPy functions and pandas to explore the data.

  1. Determine the number of records in your data:
  2. Determine how many individuals earn more or less than $50,000. For this step, this example uses pandas to demonstrate how you can switch between using SASPy and pandas seamlessly.
    1. Change your SAS data object into a pandas data frame:
    2. Use the value_counts function to determine how many individuals earn more or less than $50,000:
    3. View the percent of individuals whose income is greater than $50,000:                                               
    4. Display all your values to gain an understanding of your data:

As you can see from the output above, there are 30,162 records. About 7,508 individuals earn more than $50,000, and about 22,654 individuals make up to $50,000. From all the data, you can see about 25% percent of individuals earn more than $50,000.

  1. It is also important to look at your numerical features. Use SASPy to get a quick description of your data:

As you can see above, the table lists calculated values for the mean, median, and other valuable statistical values.

Exploring your data is just the first step in generating your machine-learning solutions. This blog post described how to generate basic statistical values and display output using SASPy, pandas, and Python. Part 2 and 3 of this blog post cover how to prepare your data using SASPy and to then apply it to a machine learning model.

For more information about the data set, see the UC Irvine Machine Learning Repository.

Machine learning with SASPy: Exploring and preparing your data (part 1) was published on SAS Users.

9月 102019
 

SASPy is a powerful Python library that interfaces with SAS and can help with your machine-learning solutions. SASPy was created for Python programmers to leverage the power of SAS within their Python scripts. If you are not familiar with SASPy, see the following resources:

This blog post shows you how powerful SASPy can be. SASPy helps you with providing visuals and descriptive statistics quickly and accurately. To demonstrate this capability, let’s explore and prepare your data using SASPy.

Prerequisites

To get started, here is what you need:

  • The Census Income data set from the University of California Irvine’s Machine Learning Repository
    • Download the adult.data data set from the data folder.
    • Remove the missing values prior to exploring and preparing.
  • SAS®9.4 or SAS® Viya® 3.1 or any later variations of these
  • Jupyter Notebook
  • SASPy (To install SASPy, refer to the installation and configuration documentation.)

After verifying you have completed the above requirements, you can start your Jupyter Notebook and begin coding using SASPy.

Let's start by importing libraries we will use in this example

  1. Import the libraries:
  2. Start your SAS session. Use the command below to establish a connection.

A "SAS Connection established" message returns once connected. This example uses a local connection to SAS. However, you can use an STDIO connection or an IOM connection to SAS if you prefer. For more information, see SAS Configuration.

  1. Read in your data set. You have two options: You can either read in the data set using pandas and then read the data into a SAS data object or you can read it directly into a SAS data object. This example shows reading the data directly into a SAS data object.

To access existing data in a SAS session, use the SAS data object. A SAS data object can be used to do the following:

  • Create various graphs such as histograms, scatter plots, heatmaps, and so on.
  • Display descriptive statistics.
  • Transfer data in between a pandas data frame and a SAS data object.

The SAS data object is versatile. To view all of its capabilities, refer to the SAS Data Object documentation.

  1. Verify whether you successfully read in your data set:

Similar to pandas, SASPy has a head function to display data points. The only difference is when you are specifying how many data points you would like to see. You need to include “obs=n” if you are using a SAS data object.

Exploring your Data

SASPy provides many options to explore your data. This example uses a combination of SASPy functions and pandas to explore the data.

  1. Determine the number of records in your data:
  2. Determine how many individuals earn more or less than $50,000. For this step, this example uses pandas to demonstrate how you can switch between using SASPy and pandas seamlessly.
    1. Change your SAS data object into a pandas data frame:
    2. Use the value_counts function to determine how many individuals earn more or less than $50,000:
    3. View the percent of individuals whose income is greater than $50,000:                                               
    4. Display all your values to gain an understanding of your data:

As you can see from the output above, there are 30,162 records. About 7,508 individuals earn more than $50,000, and about 22,654 individuals make up to $50,000. From all the data, you can see about 25% percent of individuals earn more than $50,000.

  1. It is also important to look at your numerical features. Use SASPy to get a quick description of your data:

As you can see above, the table lists calculated values for the mean, median, and other valuable statistical values.

Exploring your data is just the first step in generating your machine-learning solutions. This blog post described how to generate basic statistical values and display output using SASPy, pandas, and Python. Part 2 and 3 of this blog post cover how to prepare your data using SASPy and to then apply it to a machine learning model.

For more information about the data set, see the UC Irvine Machine Learning Repository.

Machine learning with SASPy: Exploring and preparing your data (part 1) was published on SAS Users.

9月 092019
 

If you consume NBA content through social media, then you know just how active that online community is. Basketball arguments and ‘hot takes’ on the Internet are about as commonplace as Michael Jordan playing golf instead of running a functional NBA front office. I wondered if NBA fans happened to [...]

The Memphis Grizzlies have the best NBA arena. Here's why was published on SAS Voices by Frank Silva

9月 092019
 

If you consume NBA content through social media, then you know just how active that online community is. Basketball arguments and ‘hot takes’ on the Internet are about as commonplace as Michael Jordan playing golf instead of running a functional NBA front office. I wondered if NBA fans happened to [...]

The Memphis Grizzlies have the best NBA arena. Here's why was published on SAS Voices by Frank Silva

9月 092019
 

Editor's Note: This article was translated and edited by SAS USA and was originally written by Makoto Unemi. The original text is here.

SAS previously provided SAS Scripting Wrapper for Analytics Transfer (SWAT), a package for using SAS Viya functions from various general-purpose programming languages ​​such as Python.

In addition to SWAT, SAS launched Deep Learning Python (DLPy), a higher-level API package for Python, making it possible to use SAS Viya functions more efficiently from Python. In this article I outline more about what DLPy is and how it's implementation.

About DLPy

DLPy is a high-level package for the Python API created for deep learning and image action set after Viya3.3. DLPy provides an API similar to Keras to improve the efficiency of deep learning and image processing coding. With just a little rewriting of the existing Keras code, it is possible to execute the processing on SAS Viya.

For example, below is an example of a Convolutional Neural Network (CNN) layer definition; you can see that it is very similar to Keras.

The layers supported by DLPy are: InputLayer, Conv2d, Pooling, Dense, Recurrent, BN, Res, Proj, and OutputLayer. The following is an example of learning.

DLPy functions

Introducing DLPy's functions (partial excerpts), taking as an example the learning of multiple dolphins and giraffe images using CNN and applying test images to the model.

Implementation of major deep learning networks

DLPy offers the following pre-built deep learning models: VGG11/13/16/19, ResNet34/50/101/152, wide_resnet, and dense_net.

The following models also offer pre-trained weights using ImageNet data (these weights can be used for unique tasks by transfer learning): VGG16, VGG19, ResNet50, ResNet101, and ResNet152. The following is an example of transferring ResNet50 pre-trained weights.

CNN judgment basis information

Using the heat_map_analysis() method, you can output a colorful heat map and check where you focused on the image.

In addition, the get_feature_maps() method is used to get the feature map of each layer of CNN, and feature_maps.display() method is used to specify and display the obtained feature map layer and check can also do.

The following is the output result of layer 1 feature map.

The following is the output result of layer 18 feature map.

Deep learning & image processing related task support function

resize() method: Resize image data

as_patches() method: Image data expansion (generates a patch from the original image)

two_way_split() method: Data split (learning, testing)

plot_network() method: draws the structure of the defined deep learning layer (network) as a graphical diagram

plot_training_history() method: Iterative learning history display

predict() method: Display prediction (scoring) results

plot_predict_res() method: Display classification results

And of course, you can use DLPy to get data from a SAS Viya in-memory session, pass it to your local client, and convert it to common data formats like numpy arrays and Pandas DataFrames. The converted data can be smoothly supplied to models of other open source packages such as scikit-learn.

Regarding image classification using DLPy, videos are also available in the Deep Learning with Python (DLPy) Demo Series section of the DLPy product page.

SAS Viya: Package for Python API for deep learning and image processing: DLPy was published on SAS Users.

9月 092019
 

The TEXT statement in PROC SGPLOT supports the ROTATE= option to rotate the specified text. It is worth knowing how the ROTATE= option interacts with the POSITION= option, which determines the anchor point at which the text is positioned. Briefly, the text is positioned FIRST, then the rotation occurs. The following list summarizes the interaction for common choices of the POSITION= option:

  • POSITION=LEFT: The text is positioned to the left of the anchor point, then rotated around the anchor point. For example, if you rotate by 90 degrees, the text will be below the anchor point.
  • POSITION=RIGHT: The text is positioned to the right of the anchor point, then rotated around the anchor point. For example, if you rotate by 90 degrees, the text will be above the anchor point.
  • POSITION=BOTTOM: The text is positioned below the anchor point, then rotated around the anchor point. For example, if you rotate by 90 degrees, the text will be to the right of the anchor point.
  • POSITION=TOP: The text is positioned above the anchor point, then rotated around the anchor point. For example, if you rotate by 90 degrees, the text will be to the left of the anchor point.

This summary is best illustrated by showing examples of rotated text. In the following example, the anchor point is always (0,0). The program uses either POSITION=LEFT or POSITION=RIGHT to positions a text string, which is then rotated through a series of angles.

data TextRotate;
retain x y 0;
length Labl $ 30;
do Angle = 0 to 315 by 45;
   Labl = catx(' ', '---', 'Angle', Angle, '---');
output;
end;
 
ods graphics / width=300px height=300px;
%macro RotateTextPlot(position);
title "Rotated Text - POSITION=&position";
proc sgplot data=TextRotate noautolegend;
   scatter x=x y=y / markerattrs=(symbol=CircleFilled);
   text x=x y=y text=labl / rotate=Angle position=&position textattrs=(size=12);
   yaxis display=none;
run;
%mend;
 
%RotateTextPlot(LEFT);
%RotateTextPlot(RIGHT);

In a similar way, the following DATA step creates a series of anchor points on the unit circle. The program uses either POSITION=BOTTOM or POSITION=TOP to positions a text string, which is then rotated by an angle.

data TextRotate;
length Labl $ 30;
pi = constant('pi');
do Angle = 0 to 315 by 45;
   x = cos(Angle*pi/180); y = sin(Angle*pi/180);
   Labl = catx(' ', '---', 'Angle', Angle, '---');
output;
end;
 
%RotateTextPlot(BOTTOM);
%RotateTextPlot(TOP);

Be aware that this behavior is different from the way that text rotation works in SG annotation. To see how the anchor point (the value of the ANCHOR variable) interacts with the rotation angle (the ROTATE variable) in an SG annotation, see pp. 113–116 in Warren Kuhfeld's free e-book, Advanced ODS Graphics Examples, which inspired the graphs in this blog post. Briefly, if you anchor text on the left, then the text is positioned to the right, and vice versa.

The post Anchor points and rotated text in PROC SGPLOT appeared first on The DO Loop.