#SASPress

2月 052020
 

One of the first and most important steps in analyzing data, whether for descriptive or inferential statistical tasks, is to check for possible errors in your data. In my book, Cody's Data Cleaning Techniques Using SAS, Third Edition, I describe a macro called %Auto_Outliers. This macro allows you to search for possible data errors in one or more variables with a simple macro call.

Example Statistics

To demonstrate how useful and necessary it is to check your data before starting your analysis, take a look at the statistics on heart rate from a data set called Patients (in the Clean library) that contains an ID variable (Patno) and another variable representing heart rate (HR). This is one of the data sets I used in my book to demonstrate data cleaning techniques. Here is output from PROC MEANS:

The mean of 79 seems a bit high for normal adults, but the standard deviation is clearly too large. As you will see later in the example, there was one person with a heart rate of 90.0 but the value was entered as 900 by mistake (shown as the maximum value in the output). A severe outlier can have a strong effect on the mean but an even stronger effect on the standard deviation. If you recall, one step in computing a standard deviation is to subtract each value from the mean and square that difference. This causes an outlier to have a huge effect on the standard deviation.

Macro

Let's run the %Auto_Outliers macro on this data set to check for possible outliers (that may or may not be errors).

Here is the call:

%Auto_Outliers(Dsn=Clean.Patients,
               Id=Patno,
               Var_List=HR SBP DBP,
               Trim=.1,
               N_Sd=2.5)

This macro call is looking for possible errors in three variables (HR, SBP, and DBP); however, we will only look at HR for this example. Setting the value of Trim equal to .1 specifies that you want to remove the top and bottom 10% of the data values before computing the mean and standard deviation. The value of N_Sd (number of standard deviations) specifies that you want to list any heart rate beyond 2.5 trimmed standard deviations from the mean.

Result

Here is the result:

After checking every value, it turned out that every value except the one for patient 003 (HR = 56) was a data error. Let's see the mean and standard deviation after these data points are removed.

Notice the Mean is now 71.3 and the standard deviation is 11.5. You can see why it so important to check your data before performing any analysis.

You can download this macro and all the other macros in my data cleaning book by going to support.sas.com/cody. Scroll down to Cody's Data Cleaning Techniques Using SAS, and click on the link named "Example Code and Data." This will download a file containing all the programs, macros, and data files from the book.  By the way, you can do this with any of my books published by SAS Press, and it is FREE!

Let me know if you have questions in the comments section, and may your data always be clean! To learn more about SAS Press, check out up-and-coming titles, and to receive exclusive discounts make sure to subscribe to the newsletter.

Finding Possible Data Errors Using the %Auto_Outliers Macro was published on SAS Users.

1月 132020
 

Are you ready to get a jump start on the new year? If you’ve been wanting to brush up your SAS skills or learn something new, there’s no time like a new decade to start! SAS Press is releasing several new books in the upcoming months to help you stay on top of the latest trends and updates. Whether you are a beginner who is just starting to learn SAS or a seasoned professional, we have plenty of content to keep you at the top of your game.

Here is a sneak peek at what’s coming next from SAS Press.
 
 

For students and beginners

For beginners, we have Exercises and Projects for The Little SAS® Book: A Primer, Sixth Edition, the best-selling workbook companion to The Little SAS Book by Rebecca Ottesen, Lora Delwiche, and Susan Slaughter. Exercises and Projects for The Little SAS® Book, Sixth Edition will be updated to match the updates to the new The Little SAS® Book: A Primer, Sixth Edition. This hands-on workbook is designed to hone your SAS skills whether you are a student or a professional.

 

 

For data explorers of all levels

This free e-book explores the features of SAS® Visual Data Mining and Machine Learning, powered by SAS® Viya®. Users of all skill levels can visually explore data on their own while drawing on powerful in-memory technologies for faster analytic computations and discoveries. You can manually program with custom code or use the features in SAS® Studio, Model Studio, and SAS® Visual Analytics to automate your data manipulation and modeling. These programs offer a flexible, easy-to-use, self-service environment that can scale on an enterprise-wide level. This book introduces some of the many features of SAS Visual Data Mining and Machine Learning including: programming in the Python interface; new, advanced data mining and machine learning procedures; pipeline building in Model Studio, and model building and comparison in SAS® Visual Analytics

 

 

For health care data analytics professionals

If you work with real world health care data, you know that it is common and growing in use from sources like observational studies, pragmatic trials, patient registries, and databases. Real World Health Care Data Analysis: Causal Methods and Implementation in SAS® by Doug Faries et al. brings together best practices for causal-based comparative effectiveness analyses based on real world data in a single location. Example SAS code is provided to make the analyses relatively easy and efficient. The book also presents several emerging topics of interest, including algorithms for personalized medicine, methods that address the complexities of time varying confounding, extensions of propensity scoring to comparisons between more than two interventions, sensitivity analyses for unmeasured confounding, and implementation of model averaging.

 

For those at the cutting edge

Are you ready to take your understanding of IoT to the next level? Intelligence at the Edge: Using SAS® with the Internet of Things edited by Michael Harvey begins with a brief description of the Internet of Things, how it has evolved over time, and the importance of SAS’s role in the IoT space. The book will continue with a collection of chapters showcasing SAS’s expertise in IoT analytics. Topics include Using SAS Event Stream Processing to process real world events, connectivity, using the ESP Geofence window, applying analytics to streaming data, using SAS Event Stream Processing in a typical IoT reference architecture, the role of SAS Event Stream Manager in managing ESP deployments in an IoT ecosystem, how to use deep learning with Your IoT Digital, accounting for data quality variability in streaming GPS data for location-based analytics, and more!

 

 

 

Keep an eye out for these titles releasing in the next two months! We hope this list will help in your search for a SAS book that will get you to the next step in updating your SAS skills. To learn more about SAS Press, check out our up-and-coming titles, and to receive exclusive discounts make sure to subscribe to our newsletter.

Foresight is 2020! New books to take your skills to the next level was published on SAS Users.

2月 212012
 
On Thursday, February 23, SAS Press acquisitions editor Shelley Sessoms and I will take questions about getting published, why you should publish with us and the publishing industry in general. If you like our books or have considered writing your own SAS or JMP book, join us on Twitter at [...]
11月 102011
 
Do you enjoy talking about SAS statistics and SAS books? Do you know what Twitter is? If you can honestly answer "yes" to both (or is that all 3) questions, then this post is for you. Tomorrow, SAS Publishing will host its second ever Twitter chat. This time around, we've invited SAS [...]