books

11月 012022
 

SAS SQL handles missing values differently than the ANSI standard for SQL. PROC SQL follows the SAS convention for handling missing values: numerical missing values are always interpreted as less or smaller than all nonmissing values. My first blog showed that missing values can be troublemakers in non-grouped descriptive statistics. This blog now focuses on how to handle tables if they contain missing values.

In a single table

Missing values can lead to unexpected results while working with only one table, for example, in WHERE conditions, when sorting data using ORDER BY, and when grouping data using GROUP BY. (See the following examples.)

WHERE

If a column contains missing values, a WHERE condition can lead to undesirable results under certain circumstances. If you want to query all S0666 values smaller than 1.5 in the example code below, then a simple WHERE would also include missing values.

To keep S0666 values less than 1.5, but without missing values, you can extend the WHERE condition by adding IS NOT MISSING

Example: WHERE

proc sql ;
  select * 
  from SASHELP.SYR1001
  where S0666 < 1.5 
            and S0666 is not missing ;
quit ;

ORDER BY

When sorting data in ascending order, PROC SQL places missing values before data of type numeric or string. If that ORDER BY does not contain any other column to sort by, the values in other columns are only sorted if you expressly do so. If your data contain missing values, you may have to add further sorting variables to ORDER BY after the column containing missing values, for example, T and S0502 in the example below.

Example: ORDER BY

proc sql ;
  select * 
  from SASHELP.SYR1001
  order by S0666, T, S0502 ;
quit ;

GROUP BY

Missing values have subtle effects that may lead to fake statistics. The following example demonstrates the effect that missing values have on a seemingly simple calculation of means. To demonstrate, I replaced the missing values in column S0666 with 0s and saved the new column as S0666_0. The means of both S0666 and S0666_0 are calculated based on the sum of all available values divided by the number of nonmissing values data rows.

Example: GROUP BY

proc sql ;
       select T, S0666, 
	       avg(S0666) as S0666_MEAN,  
                case
		    when S0666 is missing then 0
                         else S0666 
                  end as S0666_0, 
          avg(calculated S0666_0) as S0666_0_MEAN 
	from SASHELP.SYR1001 ;
quit ;

Output

                   S0666_              S0666_0_
 T      S0666       mean     S0666_0       mean       

 52    1.52000    1.75444    1.52000    0.90229
 53    1.84000    1.75444    1.84000    0.90229
 54    1.96000    1.75444    1.96000    0.90229
 55     .         1.75444    0.00000    0.90229
 56     .         1.75444    0.00000    0.90229
 57     .         1.75444    0.00000    0.90229
 58     .          ...

Because the denominator for S0666_0_mean is larger, the resulting mean is (not surprisingly) lower in the output. However, S0666_mean also returns means for rows that did not contribute values to its calculation (only for the denominator, if you will).

When joining tables

If you are about to join tables, you need to check whether the keys of the tables to be joined contain missing values. And if they contain missing values, you need to check if they are of the same type. SAS defines missing values in different ways. Unfortunately, a numerical column may contain different types of missing values at the same time. Missing numerical values and missing strings are handled completely differently.

Even if your data contain only one type of numerical or string missing value, you may be in for a big surprise. In the following example, the data sets ONE and TWO are identical except for the names of the columns in TWO. Each second and third row have missing values in the key; the following example highlights its effect.

 

Data ONE Data TWO
data ONE ;
   input ID A B C ;
datalines;
01 1   2  3
.  1   2  3
.  99 99 99
05 1   2  3
;
run ;
data TWO ;
   input ID D E F;
datalines;
01 1   2  3
.  1   2  3
.  99 99 99
05 1   2  3
;
run ;

 

Example: Join

proc sql ; 
create table FJ as select
a.ID, A,B,C,D,E,F
from ONE a full join TWO b
on a.ID=b.ID
order by ID ;
quit ;

 

Output

ID     A     B     C     D     E     F
 .     1     2     3    99    99    99
 .    99    99    99     1     2     3
 .    99    99    99    99    99    99
 .     1     2     3     1     2     3
 1     1     2     3     1     2     3
 5     1     2     3     1     2     3

The missing values in the key of ONE and TWO generate a Cartesian product. Each missing value of ONE is combined with each missing value of TWO (the same effect applies to non-missing values as well). With 2 times 2 missing values, two missing values are enough to multiply the associated data rows. Please note also the limited effect of ORDER BY in the presence of missing values. Beware if you may have more than one missing value in keys and are about to join one or more tables.

Have you had unexpected results from missing values in your data when using PROC SQL? You can learn more about effects caused by missing values in Chapter 2 of my book Advanced SQL with SAS.

Unexpected results from missing values with PROC SQL was published on SAS Users.

10月 032022
 

We publish a lot of books by SAS experts at SAS Press, but how does someone become an expert in the first place? Becoming certified is one step, but who develops the certifications in the first place? Those are the true experts. They have to have a deep understanding of SAS, an ability to explain difficult concepts, and a passion for sharing their knowledge. Our SAS Press Author of the Month for October is Jane Eslinger, also known around the office as the PROC REPORT expert.

Jane has written two books on PROC REPORT that are the go-to reference for anyone who wants to understand all of the capabilities of the REPORT procedure. I asked her a few questions about how she became the PROC REPORT expert and what advice she has for SAS novices.

Q: What made you decide to write a book about PROC REPORT?

At the time I was in SAS Technical Support and answering customer questions on PROC REPORT. I tell people that PROC REPORT has a large learning curve. I knew PROC REPORT but discovered that information about it was spread across a wide variety of places – SAS notes, conference papers, and SAS Communities posts. I decided that I wanted to put all of the information in one place, so I wrote the first book.

Q: Have you noticed any new trends in reporting in the last few years?

Good question. I would say that programmers are more interested in making drill-down or cascading reports. People want to click from one report to the next, more refined report.

Q: What do you think is the most misunderstood feature of PROC REPORT?

Two things, actually. First, is the compute block. Programmers know that they can assign or change values within the compute block. They also know it can be used to affect styling. But it takes practice to get all of it right. And lots of programmers use compute blocks for the bare essentials but it can do so much more.

Second, is the COLUMN statement. Programmers want to control what is in the header section of the report but do not understand how do use spanning headers and null headers to get what they want.

Q: How did you first learn SAS?

I had to use it in my regression statistics course in college, as well as take a basic programming class for my degree.

Q: As an instructor, how do you encourage students who get frustrated when learning SAS?

I tell them that they are learning a new language or a new skill. You aren’t going to get it right the first time! I also tell them that programmers with 20 years of experience still have to look up syntax.

Q: What is your best advice for someone who wants to learn SAS?

Practice. But really the best way to practice is to have a project (school or at work) that requires you to use SAS. Learning SAS is much easier when you can connect programming concepts to your own project instead of arbitrary data.

Q: Just for fun, what is the last non-technical book you read that you enjoyed?

Wish You Were Here by Jodi Picoult.


Thanks for talking to me, Jane! Visit Jane’s author page to find an exclusive discount code that you can use to purchase her e-books in the month of October.

You can also find her at SESUG 2022 where she will be giving a hands-on training session on how to create PROC REPORT code for multiple report types. And check out Jane's "Ask the Expert" webinars anytime to learn more about PROC REPORT:

 

Meet our SAS Press Author of the Month -- Jane Eslinger was published on SAS Users.

8月 012022
 

Recently, the SAS Press team moved to a new building on the SAS campus. And when the SAS Press team moves, we bring a lot of books with us! Packing and organizing all of our books gave us a chance to appreciate all of our authors’ hard work during the more than 30 years that SAS Press has existed.

One author has an outsized presence on the SAS Press bookshelves – Ron Cody. He has written over a dozen books that include some of the most popular titles for new SAS users. He taught statistics at the Rutgers Robert Wood Johnson Medical School for many years and is a frequent presenter at SAS conferences.

Few people know more about SAS than Ron, which made him the perfect person to be our first SAS Press Author of the Month. During August, you can get special deals on all of Ron's books at RedShelf. Find out more information on his author page. Ron is also hosting a free webinar on data cleaning tips and tricks on August 11th.

We recently asked Ron to share a little bit about his journey as an author and teacher. As you might imagine, Ron has a lot of SAS knowledge and advice to share.

Ron Cody's books on a bookshelf.

The Cody section of the SAS Press bookshelf.

Q: When did you decide to write your first book?

I decided to write my first SAS book (Applied Statistics and the SAS Programming Language) in 1985. It was published by Elsevier Publishing, which was later bought by Prentice Hall. At the time I was writing the book, there were no other books out there—just SAS manuals. This book is still in print (in a fifth edition).

Q: What made you decide to keep writing more books about SAS?

Once I realized I could write well enough to get published, I got the "writer's bug." Although writing a book is hard work, the reward is substantial. My motivation is more about teaching people about SAS and statistics rather than the monetary reward. My goal in writing any book is to make enough money to take my wife to dinner!

Q: Is one of your books your favorite? Or do you love them all equally?

I do have some favorites. I would put Learning SAS by Example as one. It contains a section for beginners as well as later chapters that are useful to intermediate, or even advanced SAS programmers. I particularly like my latest book, A Gentle Introduction to Statistics Using SAS Studio in the Cloud. I think I made statistical concepts accessible to non-mathematically minded people. There are only a few equations in the entire book.

Q: As a teacher, how did you encourage students who were having a hard time understanding statistics?

I try really hard to convince them that the statistical concepts really make sense, and it is sometimes the terminology that gets in the way.

Q: What do you think students struggle with the most when they are learning SAS?

I believe the overwhelming richness of SAS can intimidate a beginning programmer. That's why I start from simple examples and explain step-by-step how each program works.

Q: What is your best advice for someone who wants to learn SAS?

Buy all of my books, of course! Just kidding. There are many YouTube videos and other online resources that are useful. I think two of the best books would be The Little SAS Book, and either Learning SAS by Example (for the more serious student) or Getting Started with SAS Programming Using SAS Studio in the Cloud. The latter book is more suited to someone using SAS OnDemand for Academics.

Q: You recently published a memoir about your time as an EMT. How did it feel to reflect back on that time of your life? Any more memoirs in your future?

I thoroughly enjoyed writing a book about my years as a volunteer EMT (10-8 Awaiting Crew: Memories of a Volunteer EMT). I was fortunate that I had kept a journal and recorded details of some of the more interesting and exciting calls. As of now, I do not have another memoir in my future, but I am working on a nonfiction novel. I'm not sure how successful it will be, but I'm going to give it a try. I strive to write almost every day. I tell other beginning authors, that if you spend a few hours writing every day, it becomes much easier. I call it "getting in the groove."


Ron Cody sitting in a chair with a cat in his lap. Ron is wearing a shirt that says "using Statistics to prove a point is just MEAN."

Ron and Dudley. Credit: Jan Cody

Thanks for talking to us, Ron! We hope you will join us in celebrating Ron's accomplishments as our Author of the Month. Check out his author page to see special deals on all of Ron’s books during August and to discover more resources.

 

Meet our SAS Press Author of the Month – Ron Cody was published on SAS Users.

4月 052022
 

Have you ever heard something referred to as the bee’s knees? Most likely the person uttering that expression meant that it was truly amazing and extraordinary. Maybe you stopped and pondered the origin of the phrase. Well wonder no more! In the 1920s in the United States, people were obsessed with rhyming and anthropomorphism (giving human traits or attributes to animals). This era saw the introduction of multiple animal-centric idioms: the ant’s pants, the cat’s pajamas, and (you guessed it) the bee’s knees. Most likely this idiom refers to the pollen basket on the lower section of a worker bee’s legs (where the tibia and the tarsi connect). When bees are out foraging, they carry all the pollen they find in the pollen baskets on their legs. If you ever see a bee with lots of pollen on her legs, you know she’s been working hard… one might even say she’s an amazing and extraordinary worker!

SAS Visual Analytics has a lot of features that you might call the bee’s knees, but one of the most amazing and extraordinary features is the AggregateTable operator. This operator, introduced in SAS Visual Analytics 8.3, enables you to perform aggregations on data crossings that are independent of (or changed from) the data in your objects. This means you can use this operator to compare aggregations for different groups of data in one single object.

To illustrate the AggregateTable operator in action (and to keep with the theme), let’s consider an example.

I’m a hobby beekeeper in Texas. This means that I own and maintain a few hives in my backyard from which I collect honey and wax to sell to family and friends. I’m interested in learning about honey production in the United States for different years. I’m pretty sure I’m not the biggest honey producer in my state (or even my county), but I want to look at different crossings of total production (by state and year, by county and year, and by state).

The United States Department of Agriculture’s National Agricultural Statistics Service has Census data on honey production (measured in pounds) for all counties in the United States for 2002, 2007, 2012, and 2017.

Type of Aggregation: Add

To start, I would like to calculate the total production for each state (Maximum by State) and display the total production for the state that produced the most honey in that year. For example, in 2002 California produced the most honey of any state (15,585,925 pounds) and in 2017 North Dakota produced the most honey of any state (24,296,437 pounds).

Because the table contains details by county and I’m interested in the total production by state, I will either need to create an aggregated data source that contains details by state, or I will need to use the AggregateTable operator. Since this post is about the wonders of the AggregateTable operator, let’s focus on that.

The AggregateTable operator requires five parameters:

Parameter Description
Aggregation- Aggregated The aggregation applied to the aggregated item when it is used in an object that displays fewer group-by crossings than the table in the expression.
Aggregation- Measure The aggregation applied to the measure in the inner table context.
Type of aggregation The type of aggregation that is performed. Values are Fixed, Add, or Remove.
Categories The list of categories used to alter the data crossing for the aggregation.
Measure The measure that is aggregated. A Table operator can be added as the measure to perform a nested aggregation.

It also has a nested operator, Table, that creates an intermediate table defined by the Type of aggregation, Categories, Measure, and Aggregation- Measure parameters.

For this example, I want to use a Key value object to see the total production values for the state that produced the most honey in each year. The object will contain one group-by crossing (or category): Year. The calculation, however, will need to use two group-by crossings to determine the highest producing state for each year: Year and State. Therefore, the Aggregation-Measure is _Sum_ (to calculate the total production by state), the Type of aggregation is Add (because I want to add a crossing for State to the calculation), Categories is set to State, and Measure is Production (lb).

The intermediate table will contain one row for each state and year and contain total production values.

Then, for each year, I want the highest production value (for 2002, 15,585,925 pounds). Therefore, the Aggregation- Aggregated parameter should be _Max_ to grab the maximum values for each year from the intermediate table.

Then, I can display the Maximum by State with Year in a Key value object.

Note: Beginning in SAS Visual Analytics 2021.1.1 (May 2021), a new role is available for the Key value object, Lattice category. This enables you to display a key value object for each distinct value of a category data item (in this example, Year).

Now that I have a data item that contains the production amount for the highest producing state for each year, I can create some more complex calculations, like the percentage of total production for each year by the state that had the highest production. This will enable me to see if the highest producing state is doing all the heavy lifting or if all the states are producing a near-equal amount.

Type of Aggregation: Remove

The Type of aggregation parameter also enables you to remove categories from the calculation. For this example, suppose I want to compare the production in each county to the production from the highest producing county in that state (Maximum by County). I want to use a list table to compare these values.

The object will contain three group-by crossings: Year, State, and County. The calculation, however, will only use two group-by crossings to determine the highest producing county in each state for each year: Year and State. Therefore, the Aggregation-Measure is _Max_ (to calculate the maximum production in each state), the Type of aggregation is Remove (because I want to remove a crossing for County from the calculation), Categories is set to County, and Measure is Production (lb).

The intermediate table will contain one row for each state and year and contain the production values for the county in that state with the highest production. Notice that for this table, the aggregation for Production was set to Maximum to show the maximum production for each state.

Because the number of groupings in the object (County, Year, and State) is not fewer than the number of groupings in the calculation (Year and State), the Aggregation-Aggregated parameter is not applied and can be set to any value.

Then, I can display Maximum by County in a list table with total production by county to easily compare each county’s production with the production of the county in that state that had the highest production.

Now, I can calculate each county’s difference from the county with the highest production in that state for each year.

Type of Aggregation: Fixed

The Type of aggregation parameter also enables you to set a fixed category for the calculation. For this example, suppose I want to compare the production in each county to the total production across all years by state (Total by State). I want to use a list table to compare these values.

The object will contain three group-by crossings: Year, State, and County. The calculation, however, will only use one group-by crossing to determine the total production by state across all years: State. Therefore, the Aggregation-Measure is _Sum_ (to calculate the total production by state across all years), the Type of aggregation is Fixed (because I want to fix the crossing to State for the calculation), Categories is set to State and Measure is Production (lb).

The intermediate table will contain one row for each state and total production values across all years.

Because the number of groupings in the object (County, Year, and State) is not fewer than the number of groupings in the calculation (State), the Aggregation-Aggregated parameter is not applied and can be set to any value.

Then, I can display Total by State in a list table with total production by county to easily compare each county’s production with total production in the state across all years.

I can even compare total state values for each year with total production in that state across all years.

Then, I can calculate the share of total production produced each year.

For more information about how to create advanced data items and filters for your SAS Visual Analytics reports, including examples that use different types of operators, check out my book Interactive Reports in SAS Visual Analytics: Advanced Features and Customization.

Bees are important contributors in pollinating most of the food that you eat and are in desperate need of your help! There are many ways you can help the honeybee population thrive:

  • Become a beekeeper
  • Plant a garden for all pollinators (including bumblebees, butterflies, bats, and moths)
  • Reduce or eliminate pesticide use
  • Support your local beekeepers by buying local honey
  • Contact a beekeeping group if you see a swarm
  • Volunteer as a citizen data scientist by taking pictures and posting information on the iNaturalist app
  • Create a bee bath for those hot summer days
  • Build homes for native bees

 

The AggregateTable operator… It’s the bee’s knees! was published on SAS Users.

2月 012022
 

When we moved out to the country with our two dogs, our oldest dog Todd suddenly decided he liked to howl…. And he would do so every time we left the house. Maybe it was the country air? Maybe it was a time-lapse gene? Maybe he just wanted to learn something new?

If you’ve been using SAS Visual Analytics for a while, it’s possible you might have never created reports that linked to external URLs. SAS Visual Analytics can do so much on its own, perhaps you never thought about extending its functionality outside the product itself! Well, it’s time to learn a new trick.

To illustrate how this can be done (and to keep with the theme), let’s consider an example.

I’m interested in adding a new member to my family (a dog!), and I know I would like to adopt an animal in need. I’m not sure, however, which breed will suit my lifestyle. I need a dog that’s playful and sweet, but one that also likes to sleep late.

I have a report in SAS Visual Analytics that shows details about animals surrendered at an Austin animal shelter. I’d like to select various characteristics (like animal type, sex, whether the dog is spayed or neutered, and condition) and see what breeds they have available. Then, I’d like to see additional details about each breed at the American Kennel Club website (www.akc.org). On this website, you can find information about various breeds of dogs (and cats!), including average sizes, life expectancy, personality, and many other traits.

I’ll add an interactive link to the report, so when a user selects a specific breed, the page for that breed appears. The interactive link will use parameters to pass a selected value from the report to the web page.

To create interactive links, I like to follow four simple steps:

  1. Research the structure of the URL
  2. Use a hardcoded value to test the link
  3. Parameterize the link
  4. Test the parameterized value

Step 1: Research

Before adding interactive links to a report, you need to understand how the target web page structures the URL. I typically do this by accessing the target web page and searching for a specific subject. For some websites, you might need to view the Developer Guide for the website to fully understand the structure.

Typically, URLs are constructed in one of three ways:

  • Path: In these URLs, the subject is added at the end of the URL. For example, to view a country page on Wikipedia, you use the following URL: https://en.wikipedia.org/wiki/country where country is the full name of the country of interest.
  • Query: In these URLs, the subject is assigned to the value of a URL parameter using a sequence of attribute-value pairs: ?parameter1=value1&parameter2=value2. Multiple parameters can be assigned by separating the attribute-value pairs with an ampersand (&). For example, to search Etsy for a specific type of item, you use the following URL: https://www.etsy.com/search?q=item where item is the specific search string.
  • File: In these URLs, the subject is a part of a file name at the end of the URL. For example, to view a country profile on CIA Factbook, you use the following URL: https://www.cia.gov/library/publications/the-world-factbook/geos/country-code.html where country-code is the 2-letter abbreviation of the country of interest.

To start, I’ll select one of the breeds in the list: Australian Cattle Dog (my current dog’s breed). The American Kennel Club website has a drop-down selector at the top of the page where you can select a breed.

Australian cattle dogs are alert, curious, and pleasant. Tell me about it! He won’t let a leaf fly by outside without raising the alarm.

The URL is constructed with the breed as part of the path: https://www.akc.org/dog-breeds/australian-cattle-dog/. Notice that for breeds with multiple words (like Australian Cattle Dog), the link uses hyphens (-) instead of spaces.

Step 2: Hardcode

Now that you understand the structure of the URL, you can test the link using various hardcoded values. For example, to view details about dachshunds, go to https://www.akc.org/dog-breeds/dachshund/. Dachshunds are friendly, curious, and spunky. They must be! Why else would they be chosen to star in dog races at Oktoberfest celebrations around the world?

Step 3: Parameterize

After you have tested the URL using hardcoded values, you need to replace the hardcoded value with parameters. These are values that will be passed from your report to the external URL to make the links interactive. For the report, I’ll add the link to the word cloud and replace the hardcoded breed with the breed I select in the report.

Because the URL replaces spaces with hyphens, I have created a calculated item in SAS Visual Analytics that has breeds with multiple words separated by slashes instead of spaces, Breed (ForLink).

Because I want to pass Breed (ForLink) from the word cloud to the URL, I need to add it to one of the roles for the word cloud. I don’t want the breeds to appear with hyphens in the word cloud, so I’ll add the calculated item to the Hidden role.

Tip: Data items assigned to the Hidden role are available for color-mapped display rules, external links, and mapping data sources and should only be assigned if it will not increase the number of rows in the query. In this example, the word cloud shows details about breeds. Adding Breed (ForLink) to the Hidden role makes the value available for the external link and does not increase the number of rows in the query.

Then, to add the link:

  1. With the word cloud selected, click Actions in the right pane and expand URL Links.
  2. Click New URL Link.
  3. Specify a descriptive name for the link.
  4. For the URL, enter the URL up to, but not including, the breed (https://www.akc.org/dog-breeds/); this value will be passed from the selected breed in the word cloud.
  5. Next to Parameters, click the Add icon.
  6. For the Source field, select Breed (ForLink) and leave the Target value blank. Adding nothing to the Target field indicates that the value of Breed (ForLink) should be appended to the end of the URL.

When a viewer selects a breed in the word cloud, the breed value will be appended to the end of the URL and details for that breed will be displayed.

Step 4: Test

After the interactive link has been created, you need to ensure that the link works by testing it in the report.

I’ll select both Cat and Dog as the type of animal, Male for sex, Yes for spayed or neutered, and Aged for condition. There are 143 animals in the Austin animal shelter that meet these criteria. I’m thinking a Labrador retriever might be good for my family, so I’ll double-click Labrador Retriever in the word cloud to see the traits and characteristics for that breed. It looks like Labrador retrievers are friendly, active, and outgoing, and they are also highly adaptable (meaning I can train them to sleep late). It sounds like a perfect fit!

For more information about how to add interactive links to your SAS Visual Analytics reports, including examples on creating links with different URL structures, check out my book Interactive Reports in SAS Visual Analytics: Advanced Features and Customization.

You can’t teach an old dog new tricks… or can you? was published on SAS Users.

2月 012022
 

When we moved out to the country with our two dogs, our oldest dog Todd suddenly decided he liked to howl…. And he would do so every time we left the house. Maybe it was the country air? Maybe it was a time-lapse gene? Maybe he just wanted to learn something new?

If you’ve been using SAS Visual Analytics for a while, it’s possible you might have never created reports that linked to external URLs. SAS Visual Analytics can do so much on its own, perhaps you never thought about extending its functionality outside the product itself! Well, it’s time to learn a new trick.

To illustrate how this can be done (and to keep with the theme), let’s consider an example.

I’m interested in adding a new member to my family (a dog!), and I know I would like to adopt an animal in need. I’m not sure, however, which breed will suit my lifestyle. I need a dog that’s playful and sweet, but one that also likes to sleep late.

I have a report in SAS Visual Analytics that shows details about animals surrendered at an Austin animal shelter. I’d like to select various characteristics (like animal type, sex, whether the dog is spayed or neutered, and condition) and see what breeds they have available. Then, I’d like to see additional details about each breed at the American Kennel Club website (www.akc.org). On this website, you can find information about various breeds of dogs (and cats!), including average sizes, life expectancy, personality, and many other traits.

I’ll add an interactive link to the report, so when a user selects a specific breed, the page for that breed appears. The interactive link will use parameters to pass a selected value from the report to the web page.

To create interactive links, I like to follow four simple steps:

  1. Research the structure of the URL
  2. Use a hardcoded value to test the link
  3. Parameterize the link
  4. Test the parameterized value

Step 1: Research

Before adding interactive links to a report, you need to understand how the target web page structures the URL. I typically do this by accessing the target web page and searching for a specific subject. For some websites, you might need to view the Developer Guide for the website to fully understand the structure.

Typically, URLs are constructed in one of three ways:

  • Path: In these URLs, the subject is added at the end of the URL. For example, to view a country page on Wikipedia, you use the following URL: https://en.wikipedia.org/wiki/country where country is the full name of the country of interest.
  • Query: In these URLs, the subject is assigned to the value of a URL parameter using a sequence of attribute-value pairs: ?parameter1=value1&parameter2=value2. Multiple parameters can be assigned by separating the attribute-value pairs with an ampersand (&). For example, to search Etsy for a specific type of item, you use the following URL: https://www.etsy.com/search?q=item where item is the specific search string.
  • File: In these URLs, the subject is a part of a file name at the end of the URL. For example, to view a country profile on CIA Factbook, you use the following URL: https://www.cia.gov/library/publications/the-world-factbook/geos/country-code.html where country-code is the 2-letter abbreviation of the country of interest.

To start, I’ll select one of the breeds in the list: Australian Cattle Dog (my current dog’s breed). The American Kennel Club website has a drop-down selector at the top of the page where you can select a breed.

Australian cattle dogs are alert, curious, and pleasant. Tell me about it! He won’t let a leaf fly by outside without raising the alarm.

The URL is constructed with the breed as part of the path: https://www.akc.org/dog-breeds/australian-cattle-dog/. Notice that for breeds with multiple words (like Australian Cattle Dog), the link uses hyphens (-) instead of spaces.

Step 2: Hardcode

Now that you understand the structure of the URL, you can test the link using various hardcoded values. For example, to view details about dachshunds, go to https://www.akc.org/dog-breeds/dachshund/. Dachshunds are friendly, curious, and spunky. They must be! Why else would they be chosen to star in dog races at Oktoberfest celebrations around the world?

Step 3: Parameterize

After you have tested the URL using hardcoded values, you need to replace the hardcoded value with parameters. These are values that will be passed from your report to the external URL to make the links interactive. For the report, I’ll add the link to the word cloud and replace the hardcoded breed with the breed I select in the report.

Because the URL replaces spaces with hyphens, I have created a calculated item in SAS Visual Analytics that has breeds with multiple words separated by slashes instead of spaces, Breed (ForLink).

Because I want to pass Breed (ForLink) from the word cloud to the URL, I need to add it to one of the roles for the word cloud. I don’t want the breeds to appear with hyphens in the word cloud, so I’ll add the calculated item to the Hidden role.

Tip: Data items assigned to the Hidden role are available for color-mapped display rules, external links, and mapping data sources and should only be assigned if it will not increase the number of rows in the query. In this example, the word cloud shows details about breeds. Adding Breed (ForLink) to the Hidden role makes the value available for the external link and does not increase the number of rows in the query.

Then, to add the link:

  1. With the word cloud selected, click Actions in the right pane and expand URL Links.
  2. Click New URL Link.
  3. Specify a descriptive name for the link.
  4. For the URL, enter the URL up to, but not including, the breed (https://www.akc.org/dog-breeds/); this value will be passed from the selected breed in the word cloud.
  5. Next to Parameters, click the Add icon.
  6. For the Source field, select Breed (ForLink) and leave the Target value blank. Adding nothing to the Target field indicates that the value of Breed (ForLink) should be appended to the end of the URL.

When a viewer selects a breed in the word cloud, the breed value will be appended to the end of the URL and details for that breed will be displayed.

Step 4: Test

After the interactive link has been created, you need to ensure that the link works by testing it in the report.

I’ll select both Cat and Dog as the type of animal, Male for sex, Yes for spayed or neutered, and Aged for condition. There are 143 animals in the Austin animal shelter that meet these criteria. I’m thinking a Labrador retriever might be good for my family, so I’ll double-click Labrador Retriever in the word cloud to see the traits and characteristics for that breed. It looks like Labrador retrievers are friendly, active, and outgoing, and they are also highly adaptable (meaning I can train them to sleep late). It sounds like a perfect fit!

For more information about how to add interactive links to your SAS Visual Analytics reports, including examples on creating links with different URL structures, check out my book Interactive Reports in SAS Visual Analytics: Advanced Features and Customization.

You can’t teach an old dog new tricks… or can you? was published on SAS Users.

1月 112022
 

Last year, I wrote a blog demonstrating how to use the %Auto_Outliers macro to automatically identify possible data errors. This blog demonstrates a different approach—one that is useful for variables for which you can identify reasonable ranges of values for each variable. For example, you would not expect resting heart rates below 40 or over 100 or adult heights below 45 inches or above 84 inches. Although values outside those ranges might be valid values, it would be a good idea to check those out-of-range values to see if they are real or data errors.

In the third edition of my book, Cody's Data Cleaning Techniques, I present two macros: %Errors and %Report. These two macros provide a consolidated error report.

To demonstrate these two macros, I created a data set called Patients. A listing of the first 10 observations is shown below:

Notice that the unique identifier is called Patno, and you see three variables HR (heart rate), SBP (systolic blood pressure), and DBP (diastolic blood pressure).

The calling arguments for the %Errors macro are:

  • VAR=     A variable for which you have pre-defined bounds
  • Low=     The lowest reasonable value for this variable
  • High=    The highest reasonable value for this variable
  • Missing=Error or Missing=Ignore. The default for this argument is IGNORE, but it is still good practice to include it in the call so that anyone reading your program understands how missing values are being handled.

Because you might be calling this macro for many variables, the values of two macro variables &Dsn (data set name) and &IDVar (the identifying variable such as Patno or Subj) are assigned values once, using two %Let statements. You can then call the %Errors macro for each variable of interest. When you are finished, call the %Report macro to see a consolidated report of your possible errors.

Here is an example:

/* Set values for &Dsn and %IDVar with %LET statements */
%let Dsn = Clean.Patients;   
%let Idvar = Patno;
 
%Errors(Var=HR, Low=40, High=100, Missing=error)
%Errors(Var=SBP, Low=80, High=200, Missing=ignore)
%Errors(Var=DBP, Low=60, High=120)   
 
/* When you are finished selecting variables, create the report */
%Report

You are reporting all heart rates below 40 or above 100 (and considering missing values as errors); values of SBP below 80 or above 200 (and ignoring missing values); and values of DBP below 60 or above 120 (also ignoring missing values—using the default value of IGNORE).

Here is the result:

Notice that several patients with missing values for HR are flagged as errors. I should point out that I have violated a cardinal rule of macro programming: never write a macro that has no arguments—it should be a program. However, I liked the idea of calling %Errors and then %Report. Shown below are the two macros:

/****************************************************************
| PROGRAM NAME: ERRORS.SAS  in c:\Books\Cleans\Patients         |
| PURPOSE: Accumulates errors for numeric variables in a SAS    |
|          data set for later reporting.                        |
|          This macro can be called several times with a        |
|          different variable each time. The resulting errors   |
|          are accumulated in a temporary SAS data set called   |
|          Errors.                                              |
| ARGUMENTS: Dsn=    - SAS data set name (assigned with a %LET) |
|            Idvar=  - Id variable (assigned with a %LET)       |
|                                                               |
|            Var     = The variable name to test                |
|            Low     = Lowest valid value                       |
|            High    = Highest valid value                      |
|            Missing = IGNORE (default) Ignore missing values   |
|                      ERROR Missing values flagged as errors   |
|                                                               |
| EXAMPLE: %let Dsn = Clean.Patients;                           |
|          %let Idvar = Patno;                                  |
|                                                               |
|          %Errors(Var=HR, Low=40, High=100, Missing=error)     |
|          %Errors(Var=SBP, Low=80, High=200, Missing=ignore)   |
|          %Errors(Var=DBP, Low=60, High=120)                   |
|          Test the numeric variables HR, SBP, and DBP in data  |
|          set Clean.patients for data outside the ranges       |
|          40 to 100, 80 to 200, and 60 to 120 respectively.    |
|          The ID variable is PATNO and missing values are to   |
|          be flagged as invalid for HR but not for SBP or DBP. |
****************************************************************/
%macro Errors(Var=,    /* Variable to test     */
              Low=,    /* Low value            */
              High=,   /* High value           */
              Missing=IGNORE 
                       /* How to treat missing values         */
                       /* Ignore is the default.  To flag     */
                       /* missing values as errors set        */
                       /* Missing=error                       */);
data Tmp;
   set &Dsn(keep=&Idvar &Var);
   length Reason $ 10 Variable $ 32;
   Variable = "&Var";
   Value = &Var;
   if &Var lt &Low and not missing(&Var) then do;
      Reason='Low';
      output;
   end;
 
   %if %upcase(&Missing) ne IGNORE %then %do;
      else if missing(&Var) then do;
         Reason='Missing';
         output;
      end;
   %end;
 
   else if &Var gt &High then do;
      Reason='High';
      output;
      end;
      drop &Var;
   run;
 
   proc append base=Errors data=Tmp;
   run;
 
%mend Errors;

The basic idea for the %Errors macro is to test each variable and, if it is a possible error, use PROC APPEND to add it to a data set called Errors. When the first error is detected, PROC APPEND creates the data set Errors. From then on, each observation in data set Tmp is added to data set Errors.

Most of this macro is straightforward. For those readers who are not that comfortable with macro programming, the third section (beginning with %if %upcase(&Missing)) is executed only when the value of the macro variable &Missing is not equal to IGNORE.

Below is a listing of the %Report macro:

%macro Report;
   proc sort data=Errors;
      by &Idvar;
   run;
 
   proc print data=Errors;
      title "Error Report for Data Set &Dsn";
      id &Idvar;
      var Variable Value Reason;
   run;
 
   proc delete data=Errors Tmp;
   run;
 
%mend Report;

The %Report macro is mainly a PROC PRINT of the temporary data set Errors. I added PROC DELETE to delete the two temporary data sets Error and Tmp.

You can cut and paste these macros, or you can download all of the macros, programs, and data sets from Cody's Data Cleaning Techniques Using SAS®, Third Edition, by going to support.sas.com/Cody, search for the book, then click Example Code and Data. You do not have to buy the book to download all the files (although I would be delighted if you did). This is true for all of my books published by SAS Press.

Comments and/or corrections are always welcome.

Two macros for detecting data errors was published on SAS Users.

1月 112022
 

Last year, I wrote a blog demonstrating how to use the %Auto_Outliers macro to automatically identify possible data errors. This blog demonstrates a different approach—one that is useful for variables for which you can identify reasonable ranges of values for each variable. For example, you would not expect resting heart rates below 40 or over 100 or adult heights below 45 inches or above 84 inches. Although values outside those ranges might be valid values, it would be a good idea to check those out-of-range values to see if they are real or data errors.

In the third edition of my book, Cody's Data Cleaning Techniques, I present two macros: %Errors and %Report. These two macros provide a consolidated error report.

To demonstrate these two macros, I created a data set called Patients. A listing of the first 10 observations is shown below:

Notice that the unique identifier is called Patno, and you see three variables HR (heart rate), SBP (systolic blood pressure), and DBP (diastolic blood pressure).

The calling arguments for the %Errors macro are:

  • VAR=     A variable for which you have pre-defined bounds
  • Low=     The lowest reasonable value for this variable
  • High=    The highest reasonable value for this variable
  • Missing=Error or Missing=Ignore. The default for this argument is IGNORE, but it is still good practice to include it in the call so that anyone reading your program understands how missing values are being handled.

Because you might be calling this macro for many variables, the values of two macro variables &Dsn (data set name) and &IDVar (the identifying variable such as Patno or Subj) are assigned values once, using two %Let statements. You can then call the %Errors macro for each variable of interest. When you are finished, call the %Report macro to see a consolidated report of your possible errors.

Here is an example:

/* Set values for &Dsn and %IDVar with %LET statements */
%let Dsn = Clean.Patients;   
%let Idvar = Patno;
 
%Errors(Var=HR, Low=40, High=100, Missing=error)
%Errors(Var=SBP, Low=80, High=200, Missing=ignore)
%Errors(Var=DBP, Low=60, High=120)   
 
/* When you are finished selecting variables, create the report */
%Report

You are reporting all heart rates below 40 or above 100 (and considering missing values as errors); values of SBP below 80 or above 200 (and ignoring missing values); and values of DBP below 60 or above 120 (also ignoring missing values—using the default value of IGNORE).

Here is the result:

Notice that several patients with missing values for HR are flagged as errors. I should point out that I have violated a cardinal rule of macro programming: never write a macro that has no arguments—it should be a program. However, I liked the idea of calling %Errors and then %Report. Shown below are the two macros:

/****************************************************************
| PROGRAM NAME: ERRORS.SAS  in c:\Books\Cleans\Patients         |
| PURPOSE: Accumulates errors for numeric variables in a SAS    |
|          data set for later reporting.                        |
|          This macro can be called several times with a        |
|          different variable each time. The resulting errors   |
|          are accumulated in a temporary SAS data set called   |
|          Errors.                                              |
| ARGUMENTS: Dsn=    - SAS data set name (assigned with a %LET) |
|            Idvar=  - Id variable (assigned with a %LET)       |
|                                                               |
|            Var     = The variable name to test                |
|            Low     = Lowest valid value                       |
|            High    = Highest valid value                      |
|            Missing = IGNORE (default) Ignore missing values   |
|                      ERROR Missing values flagged as errors   |
|                                                               |
| EXAMPLE: %let Dsn = Clean.Patients;                           |
|          %let Idvar = Patno;                                  |
|                                                               |
|          %Errors(Var=HR, Low=40, High=100, Missing=error)     |
|          %Errors(Var=SBP, Low=80, High=200, Missing=ignore)   |
|          %Errors(Var=DBP, Low=60, High=120)                   |
|          Test the numeric variables HR, SBP, and DBP in data  |
|          set Clean.patients for data outside the ranges       |
|          40 to 100, 80 to 200, and 60 to 120 respectively.    |
|          The ID variable is PATNO and missing values are to   |
|          be flagged as invalid for HR but not for SBP or DBP. |
****************************************************************/
%macro Errors(Var=,    /* Variable to test     */
              Low=,    /* Low value            */
              High=,   /* High value           */
              Missing=IGNORE 
                       /* How to treat missing values         */
                       /* Ignore is the default.  To flag     */
                       /* missing values as errors set        */
                       /* Missing=error                       */);
data Tmp;
   set &Dsn(keep=&Idvar &Var);
   length Reason $ 10 Variable $ 32;
   Variable = "&Var";
   Value = &Var;
   if &Var lt &Low and not missing(&Var) then do;
      Reason='Low';
      output;
   end;
 
   %if %upcase(&Missing) ne IGNORE %then %do;
      else if missing(&Var) then do;
         Reason='Missing';
         output;
      end;
   %end;
 
   else if &Var gt &High then do;
      Reason='High';
      output;
      end;
      drop &Var;
   run;
 
   proc append base=Errors data=Tmp;
   run;
 
%mend Errors;

The basic idea for the %Errors macro is to test each variable and, if it is a possible error, use PROC APPEND to add it to a data set called Errors. When the first error is detected, PROC APPEND creates the data set Errors. From then on, each observation in data set Tmp is added to data set Errors.

Most of this macro is straightforward. For those readers who are not that comfortable with macro programming, the third section (beginning with %if %upcase(&Missing)) is executed only when the value of the macro variable &Missing is not equal to IGNORE.

Below is a listing of the %Report macro:

%macro Report;
   proc sort data=Errors;
      by &Idvar;
   run;
 
   proc print data=Errors;
      title "Error Report for Data Set &Dsn";
      id &Idvar;
      var Variable Value Reason;
   run;
 
   proc delete data=Errors Tmp;
   run;
 
%mend Report;

The %Report macro is mainly a PROC PRINT of the temporary data set Errors. I added PROC DELETE to delete the two temporary data sets Error and Tmp.

You can cut and paste these macros, or you can download all of the macros, programs, and data sets from Cody's Data Cleaning Techniques Using SAS®, Third Edition, by going to support.sas.com/Cody, search for the book, then click Example Code and Data. You do not have to buy the book to download all the files (although I would be delighted if you did). This is true for all of my books published by SAS Press.

Comments and/or corrections are always welcome.

Two macros for detecting data errors was published on SAS Users.

9月 132021
 

The Day of the Programmer is not enough time to celebrate our favorite code-creators. That’s why at SAS, we celebrate an entire week with SAS Programmer Week! If you want to extend the fun and learning of SAS Programmer Week year-round, SAS Press is here to support you with books for programmers at every level.

2021 has been a big year for learning, so we wanted to share the six most popular books for programmers this year. There are some old favorites on this list as well as some brand-new books on a variety of topics. Check out the list below, and see what your fellow programmers are reading this year!

  1. Little SAS Book: A Primer, Sixth Edition

This book is at the top of almost every list of recommended books for anyone who wants to learn SAS. And for good reason! It breaks down the basics of SAS into easy-to-understand chunks with tons of practice questions. If you are new to SAS or are interested in getting your basic certification, this is the book for you.

  1. Learning SAS by Example: A Programmer’s Guide, Second Edition

Whether you are learning SAS for the first time or just need a quick refresher on a single topic, this book is well-organized so that you can read start to finish or skip to your topic of interest. Filled with real-world examples, this is a book that should be on every SAS programmer’s bookshelf!

  1. Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS

If you work with big data, then you probably work with a lot of text. The third book on our list is for anyone who handles unstructured data. This book focuses on practical solutions to real-life problems. You’ll learn how to collect, cleanse, organize, categorize, explore, analyze, and interpret your data.

  1. End-to-End Data Science with SAS: A Hands-On Programming Guide

This book offers a step-by-step explanation of how to create machine learning models for any industry. If you want to learn how to think like a data scientist, wrangle messy code, choose a model, and evaluate models in SAS, then this book has the information that you need to be a successful data scientist.

  1. Cody's Data Cleaning Techniques Using SAS, Third Edition

Every programmer knows that garbage in = garbage out. Take out the trash with this indispensable guide to cleaning your data. You’ll learn how to find and correct errors and develop techniques for correcting data errors.

  1. SAS Graphics for Clinical Trials by Example

If you are a programmer who works in the health care and life sciences industry and want to create visually appealing graphs using SAS, then this book is designed specifically for you. You’ll learn how to create a wide range of graphs using Graph Template Language (GTL) and statistical graphics procedures to solve even the most challenging clinical graph problems.

An honorable mention also goes to the SAS Certification Guides. They are a great way to study for the certification exams for the SAS Certified Specialist: Base Programming and SAS Certified Professional: Advanced Programming credentials.

We have many books available to support you as you develop your programming skills – and some of them are free! Browse all our available titles today.

Top Books for SAS Programmers was published on SAS Users.

8月 032021
 

Thousands of SAS users are migrating from SAS University Edition to SAS OnDemand for Academics (ODA). I thought I would share some of my thoughts, having just finished two books using ODA (Getting Started with SAS Programming: Using SAS Studio in the Cloud and A Gentle Introduction to Statistics Using SAS Studio in the Cloud). Users of SAS University Edition had to download software to create a virtual computer on their local computer, then download the SAS software, and finally, set up a way to read and write files from their “real” computer to their “virtual computer.” This caused many people massive headaches (including me). The great news about SAS OnDemand for Academics is that you don’t have to download anything! You access SAS on a cloud platform. Also, reading data from your real computer is quite simple.

ODA was developed so that people can use it to learn how to program and run tasks using SAS. It is not to be used for commercial purposes. There is also a 5-gigabyte limit for data files, but that is certainly not a problem for learning how to use SAS.

Registering for ODA

To gain access to ODA, you need to register with SAS. Part of the registration process is to create a SAS profile. If you already have a SAS profile, skip that portion of the instructions.

To start, point your browser to:

https://welcome.oda.sas.com

You will have to wait for an email from SAS with an approval, but once you have access, it is a delight to use. Let me show you step-by-step how easy it is to upload an Excel workbook from your local computer to SAS Studio using ODA.

This first figure shows a worksheet (Drug_Study.xlsx) with information on four individuals.

Figure 1: Excel Spreadsheet

All the variable names should be self-explanatory except for SBP (systolic blood pressure) and DBP (diastolic blood pressure). It is a good idea to choose variable names that are compatible with SAS V7. The next step is to select a folder in the Server Files and Folder tab in SAS Studio and upload the file. This tab is shown in Figure 2 (with the UPLOAD icon circled).

Figure 2: Upload the File to a Folder in SAS Studio

Once you click the UPLOAD icon, you are given the option to select a file.

Figure 3: Select Your File

Select your file, click UPLOAD, and the file on your local hard drive (Drug_Study.xlsx) will appear in the SAS Studio folder (Figure 4). (Note: you can upload multiple files at one time if you choose.)

Figure 4: The File Drug_Study.xlsx in Now in the SAS Studio Folder

You can now use the IMPORT utility (under Tasks and Utilities) to convert this Excel file to a SAS data set. In this example, the default name of the WORK file (IMPORT) was changed to Drug_Study. A list of variables is shown in Figure 5.

Figure 5: List of Variable Names in Data Set Drug_Study

You can now write programs or use SAS Studio tasks to analyze this data set.

To learn more about how to use SAS Studio as part of SAS OnDemand for Academics, write SAS programs, or use SAS Studio tasks, please take a look at my new book, Getting Started with SAS Programing: Using SAS Studio in the Cloud.

I welcome comments and questions on this blog.

Migrating from SAS University Edition to SAS OnDemand for Academics was published on SAS Users.