11月 082019
 

Six editions is a lot! If you had told us, back when we wrote the first edition of The Little SAS Book, that someday we would write a sixth; we would have wondered how we could possibly find that much to say. After all, it is supposed to be The Little SAS Book, isn’t it? But the developers at SAS Institute are constantly hard at work inventing new and better ways of analyzing and visualizing data. And some of those ways turn out to be so fundamental that they belong even in a little book about SAS.

Interface independence

One of the biggest changes to SAS software in recent years is the proliferation of interfaces. SAS programmers have more choices than ever before. Previous editions contained some sections specific to the SAS windowing environment (also called Display Manager). We wrote this edition for all SAS programmers whether you use SAS Studio, SAS Enterprise Guide, the SAS windowing environment, or run in batch. That sounds easy, but it wasn’t. There are differences in how SAS behaves with different interfaces, and these differences can be very fundamental. In particular, the system option that sets the rules for names of variables varies depending on how you run SAS. So old sections had to be rewritten, and we added a whole new section showing how to use variable names containing blanks and special characters.

New ways to read and write Microsoft Excel files

Previous editions already covered how to read and write Microsoft Excel files, but SAS developers have created some great new ways. This edition contains new sections about the XLSX LIBNAME engine and the ODS EXCEL destination.

More PROC SQL

From the very first edition, The Little SAS Book always covered PROC SQL. But it was in an appendix and over time we noticed that most people ignore appendices. So for this edition, we removed the appendix and added new sections on using PROC SQL to

  • Subset your data
  • Join data sets
  • Add summary statistics to a data set
  • Create macro variables with the INTO clause

For people who are new to SQL, these sections provide a good introduction; for people who already know SQL, they provide a model of how to leverage SQL in your SAS programs.

Updates and additions throughout the book

Almost every section in this edition has been changed in some way. We added new options, made sure everything is up-to-date, and ran every example in every SAS interface noting any differences. For example, PROC SGPLOT has some new options, the default ODS style for PDF has changed, and the LISTING destination behaves differently in different interfaces. Here’s a short list, in no particular order, of new or expanded topics in the sixth edition:

  • More examples with permanent SAS data sets, CSV files, or tab-delimited files
  • More log notes throughout the book showing what to look for
  • LIKE or sounds-like (=*) operators in WHERE statements
  • CROSSLIST, NOCUM, and NOPRINT options in PROC FREQ
  • Grouping data with a user-defined format and the PUT function
  • Iterative DO groups
  • DO WHILE and DO UNTIL statements
  • %DO statements

Even though we have added a lot to this edition, it is still a little book.  In fact, this edition is shorter than the last—by twelve pages! We think this is the best edition yet.

10月 222019
 

I am excited to announce that the sixth edition of The Little SAS Book is now available. We spent over a year rewriting and updating, and this may well be the best edition yet.

You can download a sample chapter or purchase e-book versions (PDF, EPUB or Kindle) by visiting SAS Press’ site.

If, like me, you like to be able to flip the pages and make notes in the margin, then you can get a hard copy (in paperback or hardback!) from Amazon.

9月 162019
 

SAS SPDS is lightning fastJust when you think you’ve seen it all, life can surprise you in a big way, making you wonder what else you've missed.

That is what happened when I recently had a chance to work with the SAS® Scalable Performance Data Server, a product that's been around a while, but I never crossed paths with before. I open an SPDS table of a hundred million records in SAS® Enterprise Guide, and I can scroll it as fast as if it were an Excel “baby” spreadsheet of a hundred rows. That’s how powerful it feels, to say nothing about the lighting speed of the queries.

What is the SAS Scalable Performance Data Server?

Also known as the SAS SPD Server (or SPDS), it's a data storage system designed for high-performance data delivery. Its primary purpose is to provide rapid queries of vast amounts of data. We are talking terabytes of data with tables containing billions of rows. SPDS employs parallel storage and efficient indexing, coupled with a multi-threaded server system concurrently processing tasks distributed across multiple processors.

Availability of the SPDS client in SAS® Viya effectively integrates SAS SPDS with SAS Viya, extending functionality of its applications beyond the native Cloud Analytic Services (CAS) where you can continue reaping all the benefits of the SAS SPDS.

SPDS library

In addition to connecting to SPD Server with explicit SQL pass-through, connection to SPD Server with a LIBNAME statement is available as well, for example:

libname mylibref sasspds 'serverdomain' host='nodename_or_ip' service='5400'
                         user='mySPDuserid' password='{SAS003}XXXXXXX...XXX';

This effectively creates an SPDS library, and the tables in that library can be referenced by two-level name mylibref.tablename as if this were a SAS BASE library.

Cluster tables vs. member tables

Besides ordinary data tables, SPDS library offers so called dynamic cluster tables – or clusters for short – enabling transparent access to large amounts of data.

Dynamic cluster tables (cluster tables or clusters) are virtual tables that allow users to access many server tables (member tables) as if they were one table. A dynamic cluster table is a collection of SPD Server tables that are presented to the end-user application as a single table through a metadata layer acting like a view.

Member tables can be added to the cluster as well as replaced and removed from the cluster.

The role of PROC SPDO

PROC SPDO is the SAS procedure for the SPD Server operator interface. It performs a wide range of SPD server, user and table management tasks:

  • create, list, modify, destroy, and undo dynamic cluster tables
  • add, remove, replace, and fix cluster table members
  • add, modify, list, and delete access control lists (ACLs) for server resources
  • define, describe, and remove WHERE constraints on tables for row-level security definition and management
  • issue system commands on server nodes

In addition to PROC SPDO, SPD Server plug-in for SAS® Data Management Console is also available.

Retrieving SPDS library contents

If you open an SPDS library in SAS Enterprise Guide, you won’t be able to tell which table in that library is a member table and which is a cluster table – they all look the same. But in many cases, we need to know what is what. Moreover, for data-driven processing we need to capture the SPDS library objects into a dataset and identify them whether they are clusters or member tables.

Luckily, PROC CONTENTS with OUT= option allows us to do just that. While MEMTYPE column is equal to ‘DATA’ for both, clusters and member tables, there is another, less known column inversely called TYPEMEM that has value of 'DATA' for clusters and blank value ' ' for member tables. The following simple code allows you to retrieve SPDS library objects list into WORK.SPDSTYPES dataset where TABLETYPE column specifies whether it’s a cluster or a member for each library object MEMNAME:

proc contents data=SPDSLIB._all_ out=WORK.ALLOBJECTS (keep=MEMNAME TYPEMEM);
run;
 
proc sort data=WORK.ALLOBJECTS nodupkey;
   by MEMNAME;
run;
 
data WORK.SPDSTYPES;
   set WORK.ALLOBJECT;
   attrib TABLETYPE $7 label='SPDS table type';
   select(TYPEMEM);
      when('DATA') TABLETYPE = 'CLUSTER';
      when('')     TABLETYPE = 'MEMBER';
      otherwise    TABLETYPE = '';
   end;
run;

In this code PROC CONTENTS produces one record per column NAME in every object MEMNAME in the SPDS library; PROC SORT reduces (un-duplicates) this list to one record per MEMNAME; finally, data step creates TABLETYPE column indicating which MEMNAME is CLUSTER and which is MEMBER.

Retrieving SPDS cluster’s member list

In addition to retrieving a list of objects in the SPDS library described above, we also need a way of capturing the content (a list of members) of the cluster itself in order to control removing or replacing its members. PROC SPDO’s CLUSTER LIST statement produces such a list, and its OUT= option allows you to dump that list into a dataset:

proc spdo lib=SPDSLIB;
   cluster list CLUSTER1 out=CLUSTER1_MEMBERS;
   cluster list CLUSTER1 out=CLUSTER2_MEMBERS;
   /* ... */
   cluster list CLUSTER1 out=CLUSTERN_MEMBERS;
quit;

This approach creates one output table per cluster, and you can’t use the same OUT= destination table for different clusters, for they will be overwritten with each subsequent CLUSTER LIST statement, not appended.

If you need to capture contents of several clusters into one dataset, then instead of the above method of outputting each cluster content into separate table and then appending (concatenating) them, the good old ODS OUTPUT with CLUSTERLIST= option allows us to do it in a single step:

ods noresults;
ods output clusterlist=WORK.CLUSTER_MEMS;
proc spdo lib=SPDSLIB;
   cluster list CLUSTER1;
   cluster list CLUSTER2;
   /* ... */
   cluster list CLUSTERN;
quit;
ods output close;
ods results;

As additional bonus ODS NORESULTS suppresses printed output when it’s not needed, e.g. for automatic data-driven processing.

Your thoughts?

What is your experience with SAS SPDS? How might you use it in the future? Please comment below.

How to retrieve contents of SAS® Scalable Performance Data Server library was published on SAS Users.

9月 102019
 

SASPy is a powerful Python library that interfaces with SAS and can help with your machine-learning solutions. SASPy was created for Python programmers to leverage the power of SAS within their Python scripts. If you are not familiar with SASPy, see the following resources:

This blog post shows you how powerful SASPy can be. SASPy helps you with providing visuals and descriptive statistics quickly and accurately. To demonstrate this capability, let’s explore and prepare your data using SASPy.

Prerequisites

To get started, here is what you need:

  • The Census Income data set from the University of California Irvine’s Machine Learning Repository
    • Download the adult.data data set from the data folder.
    • Remove the missing values prior to exploring and preparing.
  • SAS®9.4 or SAS® Viya® 3.1 or any later variations of these
  • Jupyter Notebook
  • SASPy (To install SASPy, refer to the installation and configuration documentation.)

After verifying you have completed the above requirements, you can start your Jupyter Notebook and begin coding using SASPy.

Let's start by importing libraries we will use in this example

  1. Import the libraries:
  2. Start your SAS session. Use the command below to establish a connection.

A "SAS Connection established" message returns once connected. This example uses a local connection to SAS. However, you can use an STDIO connection or an IOM connection to SAS if you prefer. For more information, see SAS Configuration.

  1. Read in your data set. You have two options: You can either read in the data set using pandas and then read the data into a SAS data object or you can read it directly into a SAS data object. This example shows reading the data directly into a SAS data object.

To access existing data in a SAS session, use the SAS data object. A SAS data object can be used to do the following:

  • Create various graphs such as histograms, scatter plots, heatmaps, and so on.
  • Display descriptive statistics.
  • Transfer data in between a pandas data frame and a SAS data object.

The SAS data object is versatile. To view all of its capabilities, refer to the SAS Data Object documentation.

  1. Verify whether you successfully read in your data set:

Similar to pandas, SASPy has a head function to display data points. The only difference is when you are specifying how many data points you would like to see. You need to include “obs=n” if you are using a SAS data object.

Exploring your Data

SASPy provides many options to explore your data. This example uses a combination of SASPy functions and pandas to explore the data.

  1. Determine the number of records in your data:
  2. Determine how many individuals earn more or less than $50,000. For this step, this example uses pandas to demonstrate how you can switch between using SASPy and pandas seamlessly.
    1. Change your SAS data object into a pandas data frame:
    2. Use the value_counts function to determine how many individuals earn more or less than $50,000:
    3. View the percent of individuals whose income is greater than $50,000:                                               
    4. Display all your values to gain an understanding of your data:

As you can see from the output above, there are 30,162 records. About 7,508 individuals earn more than $50,000, and about 22,654 individuals make up to $50,000. From all the data, you can see about 25% percent of individuals earn more than $50,000.

  1. It is also important to look at your numerical features. Use SASPy to get a quick description of your data:

As you can see above, the table lists calculated values for the mean, median, and other valuable statistical values.

Exploring your data is just the first step in generating your machine-learning solutions. This blog post described how to generate basic statistical values and display output using SASPy, pandas, and Python. Part 2 and 3 of this blog post cover how to prepare your data using SASPy and to then apply it to a machine learning model.

For more information about the data set, see the UC Irvine Machine Learning Repository.

Machine learning with SASPy: Exploring and preparing your data (part 1) was published on SAS Users.

9月 102019
 

SASPy is a powerful Python library that interfaces with SAS and can help with your machine-learning solutions. SASPy was created for Python programmers to leverage the power of SAS within their Python scripts. If you are not familiar with SASPy, see the following resources:

This blog post shows you how powerful SASPy can be. SASPy helps you with providing visuals and descriptive statistics quickly and accurately. To demonstrate this capability, let’s explore and prepare your data using SASPy.

Prerequisites

To get started, here is what you need:

  • The Census Income data set from the University of California Irvine’s Machine Learning Repository
    • Download the adult.data data set from the data folder.
    • Remove the missing values prior to exploring and preparing.
  • SAS®9.4 or SAS® Viya® 3.1 or any later variations of these
  • Jupyter Notebook
  • SASPy (To install SASPy, refer to the installation and configuration documentation.)

After verifying you have completed the above requirements, you can start your Jupyter Notebook and begin coding using SASPy.

Let's start by importing libraries we will use in this example

  1. Import the libraries:
  2. Start your SAS session. Use the command below to establish a connection.

A "SAS Connection established" message returns once connected. This example uses a local connection to SAS. However, you can use an STDIO connection or an IOM connection to SAS if you prefer. For more information, see SAS Configuration.

  1. Read in your data set. You have two options: You can either read in the data set using pandas and then read the data into a SAS data object or you can read it directly into a SAS data object. This example shows reading the data directly into a SAS data object.

To access existing data in a SAS session, use the SAS data object. A SAS data object can be used to do the following:

  • Create various graphs such as histograms, scatter plots, heatmaps, and so on.
  • Display descriptive statistics.
  • Transfer data in between a pandas data frame and a SAS data object.

The SAS data object is versatile. To view all of its capabilities, refer to the SAS Data Object documentation.

  1. Verify whether you successfully read in your data set:

Similar to pandas, SASPy has a head function to display data points. The only difference is when you are specifying how many data points you would like to see. You need to include “obs=n” if you are using a SAS data object.

Exploring your Data

SASPy provides many options to explore your data. This example uses a combination of SASPy functions and pandas to explore the data.

  1. Determine the number of records in your data:
  2. Determine how many individuals earn more or less than $50,000. For this step, this example uses pandas to demonstrate how you can switch between using SASPy and pandas seamlessly.
    1. Change your SAS data object into a pandas data frame:
    2. Use the value_counts function to determine how many individuals earn more or less than $50,000:
    3. View the percent of individuals whose income is greater than $50,000:                                               
    4. Display all your values to gain an understanding of your data:

As you can see from the output above, there are 30,162 records. About 7,508 individuals earn more than $50,000, and about 22,654 individuals make up to $50,000. From all the data, you can see about 25% percent of individuals earn more than $50,000.

  1. It is also important to look at your numerical features. Use SASPy to get a quick description of your data:

As you can see above, the table lists calculated values for the mean, median, and other valuable statistical values.

Exploring your data is just the first step in generating your machine-learning solutions. This blog post described how to generate basic statistical values and display output using SASPy, pandas, and Python. Part 2 and 3 of this blog post cover how to prepare your data using SASPy and to then apply it to a machine learning model.

For more information about the data set, see the UC Irvine Machine Learning Repository.

Machine learning with SASPy: Exploring and preparing your data (part 1) was published on SAS Users.

4月 252019
 

I’m excited because in a couple days I will fly to Dallas for SAS Global Forum 2019, the biggest SAS conference of the year, attended by thousands.

If you are coming, I hope you will say hello to me.  If you can’t make it to Dallas, you’ll be glad to know that many presentations will be livecast. Here is the schedule

A few highlights:

Sunday, April 28, 7:00-8:30 pm CT–Opening Session

Monday, April 29, 8:30-10:00 am CT–General Session: Technology Connection

Tuesday, April 30, 3:00-4:00 pm CT–Career Advice We’d Give to Our Kids: A Panel Discussion

Wednesday, May 1, 10:30-11:30 am CT–The Good, the Bad, and the Creepy: Why Data Scientists Need to Understand Ethics

These presentations may not be available after the conference so check the schedule and make sure to tune in at the right time.

 

 

 

 

4月 212019
 

 

This year I’ve had the honor of helping to recruit speakers for the Career Development area at SAS Global Forum. We have some fantastic presentations that everyone can benefit from whether you are a student, a new graduate, or a mid-career professional.

I particularly recommend the panel discussion (Career Advice We’d Give to Our Kids) Tuesday April 30, 3:00-4:00 in Level 2, Ballroom C4. The panelists (Shelley Blozis, AnnMaria De Mars, Paul LaBrec) are all great so this should be both informative and entertaining.

The following presentations are listed in order by day and time. As you scroll through this list, you may notice that most (but not all!) of these presentations are in Level 1 Room D168.

Poster (available every day)
Tips to Ensure Success in Your New SAS Project
Flora Fang Liu

Tuesday, April 30, 2019

10:00-11:00 Level 1, D168
Don’t Just Survive, Thrive! A Learning- Based Strategy for a Modern Organization Centered Around SAS
Jovan Marjanovic

11:00-12:00 Level 1, D168
The Power of Know-How: Pump Up Your Professional Value by Refining Your SAS Skills
Gina Huff

1:00-1:15 Level 2, Exhibit Hall D, Super Demo 12
SAS Programming Exam Moves to Performance-Based Format
Mark Stevens

1:30-2:00 Level 1, D168
The Why and How of Teaching SAS to High School Students
Jennifer Richards

2:00-2:30 Level 1, D168
Puzzle Me, Puzzle You: How a Thought Experiment Became a Rubik’s Cube Among a Set of Fun Puzzles
Amit Patel, Lewis Mitchell

2:30-3:00 Level 1, D168
How to Land Work as a SAS Professional
Charu Shankar

3:00-3:15 Level 2, Exhibit Hall D, Super Demo 12
Take SAS Certification Exams from Home Online Proctored
Terry Barham

3:00-4:00 Level 2, Ballroom C4
Panel Discussion: Career Advice We’d Give to Our Kids
Shelley Blozis, AnnMaria De Mars, Paul LaBrec

3:00-4:00 Level 1, D168
How To Be an Effective Statistician
Alexander Schacht

4:00-5:00 Level 1, D168
Stories from the Trenches: Tips and Techniques for Career Advancement from a SAS Industry Recruiter
Molly Hall

5:00-5:30 Level 1, D168
How to HOW: Hands-on- Workshops Made Easy
Chuck Kincaid

Wednesday, May 1, 2019

10:00-11:00 Level 2, Ballroom C3
Tell Me a Data Story
Kat Greenbrook

10:00-11:00 Level 2 Ballroom C4
The Good, The Bad, and The Creepy: Why Data Scientists Need to Understand Ethics
11:00 Jennifer Priestley HOW POI

11:30-12:00 Level 1, D168
New to SAS? Helpful Hints for Developing Your Own Professional Development Plan
Kelly Smith

2月 132019
 

SAS has worked with our exam delivery partners to integrate a live lab into an exam, which can be delivered anywhere, anytime, on-demand.

The post New Performance-Based Certification: Write SAS Code During Your Exam appeared first on SAS Learning Post.

1月 142019
 

New to SAS?  Here are tips from the translator of The Little SAS Book, Fifth Edition.

Hongqiu Gu, Ph.D. works at the China National Clinical Research Center for Neurological Diseases at the National Center for Healthcare Quality Management in Neurological Diseases at Beijing Tiantan Hospital, Capital Medical University.

He shared these important tips to learn SAS well:

1.  Read SAS Reference Books

I have not counted the number of SAS books I have read; I would estimate over 50 or 60.  The best books to give me a deep understanding of SAS are the SAS Reference Books, including SAS Language Reference Concepts, SAS Functions and CALL Routines Reference, SAS Macro Language Reference, and so on.  There are lots of excellent books published by SAS Press, and usually they are concise and suitable for quick learners.  However, when I realized that SAS could give me a powerful career advantage, I needed to learn SAS systematically and deeply.  I believe the SAS Reference Books are the most authoritative and comprehensive learning materials. Besides, all the updated SAS Reference Books are free to all readers.

2.  Use the SAS Help and Documentation frequently

No one can remember all the syntaxes or options in SAS.  However, don’t worry, SAS Help and Documentation is our best friend.  I use the SAS Help and Documentation quite often.  Even as an experienced SAS user, there are still many situations in which I need to ask for help from SAS Help and Documentation. Every time I use it, I learn something new.

3.  Solve SAS related questions in SAS communities

As the saying goes, practice makes perfect.  Answering SAS related questions is a good way to practice.  Questions can come from daily work, from friends around you, or from other SAS users on the web.  From 2013 to 2015, I spent a lot of time in the largest Chinese SAS online  community answering SAS related questions and I learned many practical skills in a short period.

4.  Make friends with skilled SAS programmers

Learning alone without interacting with others will lead to ignorance.  I have learned a lot from other experienced SAS users and SAS developers.  We share our ideas from time to time, and benefit a lot from the exchange.