2月 142020
 

In honor of Valentine’s day, we thought it would be fitting to present an excerpt from a paper about the LIKE operator because when you like something a lot, it may lead to love! If you want more, you can read the full paper “Like, Learn to Love SAS® Like” by Louise Hadden, which won best paper at WUSS 2019.

Introduction

SAS provides numerous time- and angst-saving techniques to make the SAS programmer’s life easier. Among those techniques are the ability to search and select data using SAS functions and operators in the data step and PROC SQL, as well as the ability to join data sets based on matches at various levels. This paper explores how LIKE is featured in each one of these techniques and is suitable for all SAS practitioners. I hope that LIKE will become part of your SAS toolbox, too.

Smooth Operators

SAS operators are used to perform a number of functions: arithmetic calculations, comparing or selecting variable values, or logical operations. Operators are loosely grouped as “prefix” (for example a sign before a variable) or “infix” which generally perform an operation BETWEEN two variables. Arithmetic operations using SAS operators may include exponentiation (**), multiplication (*), and addition (+), among others. Comparison operators may include greater than (>, GT) and equals (=, EQ), among others. Logical, or Boolean, operators include such operands as || or !!, AND, and OR, and serve the purpose of grouping SAS operations. Some operations that are performed by SAS operators have been formalized in functions. A good example of this is the concatenation operators (|| and !!) and the more powerful CAT functions which perform similar, but not identical, operations. LIKE operators are most frequently utilized in the DATA step and PROC SQL via a DATA step.

There is a category of SAS operators that act as comparison operators under special circumstances, generally in where statements in PROC SQL and the data step (and DS2) and subsetting if statements in the data step. These operators include the LIKE operator and the SOUNDS LIKE operator, as well as the CONTAINS and the SAME-AND operators. It is beyond the scope of this short paper to discuss all the smooth operators, but they are definitely worth a look.

LIKE Operator

Character operators are frequently used for “pattern matching,” that is, evaluating whether a variable value equals, does not equal, or sounds like a specified value or pattern. The LIKE operator is a case-sensitive character operator that employs two special “wildcard” characters to specify a pattern: the percent sign (%) indicates any number of characters in a pattern, while the underscore (_) indicates the presence of a single character per underscore in a pattern. The LIKE operator is akin to the GREP utility available on Unix/Linux systems in terms of its ability to search strings.

The LIKE operator also includes an escape routine in case you need to use a string that includes a comparison operator such as the carat, the underscore, or the percent sign, etc. An example of the escape routine syntax, when looking for a string containing a percent sign, is:

where yourvar like ‘100%’ escape ‘%’;

Additionally, SAS practitioners can use the NOT LIKE operator to select variables WITHOUT a given pattern. Please note that the LIKE statement is case-sensitive. You can use the UPCASE, LOWCASE, or PROPCASE functions to adjust input strings prior to using the LIKE statement. You may string multiple LIKE statements together with the AND or OR operators.

SOUNDS LIKE Operator

The LIKE operator, described above, searches the actual spelling of operands to make a comparison. The SOUNDS LIKE operator uses phonetic values to determine whether character strings match a given pattern. As with the LIKE operator, the SOUNDS LIKE operator is useful for when there are misspellings and similar sounding names in strings to be compared. The SOUNDS LIKE operator is denoted with a short cut ‘-*’. SOUNDS LIKE is based on SAS’s SOUNDEX algorithm. Strings are encoded by retaining the original first column, stripping all letters that are or act as vowels (A, E, H, I, O, U, W, Y), and then assigning numbers to groups: 1 includes B, F, P, and V; 2 includes C, G, J, K, Q, S, X, Z; 3 includes D and T; 4 includes L; 5 includes M and N; and 6 includes R. “Tristn” therefore becomes T6235, as does Tristan, Tristen, Tristian, and Tristin.

For more on the SOUNDS LIKE operator, please read the documentation.

Joins with the LIKE Operator

It is possible to select records with the LIKE operator in PROC SQL with a WHERE statement, including with joins. For example, the code below selects records from the SASHELP.ZIPCODE file that are in the state of Massachusetts and are for a city that begins with “SPR”.

proc sql;
    CREATE TABLE TEMP1 AS
    select
        a.City ,
        a.countynm  , a.city2 ,
         a.statename , a.statename2
    from sashelp.zipcode as a
    where upcase(a.city) like 'SPR%' and 
upcase(a.statename)='MASSACHUSETTS' ; 
quit;

The test print of table TEMP1 shows only cases for Springfield, Massachusetts.

The code below joins SASHELP.ZIPCODE and a copy of the same file with a renamed key column (city --> geocity), again selecting records for the join that are in the state of Massachusetts and are for a city that begins with “SPR”.

proc sql;
    CREATE TABLE TEMP2 AS
    select
        a.City , b.geocity, 
        a.countynm  ,
        a.statename , b.statecode, 
        a.x, a.y
    from sashelp.zipcode as a, zipcode2 as b
    where a.city = b.geocity and upcase(a.city) like 'SPR%' and b.statecode
= 'MA' ;
quit;

The test print of table TEMP2 shows only cases for Springfield, Massachusetts with additional variables from the joined file.

The LIKE “Condition”

The LIKE operator is sometimes referred to as a “condition,” generally in reference to character comparisons where the prefix of a string is specified in a search. LIKE “conditions” are restricted to the DATA step because the colon modifier is not supported in PROC SQL. The syntax for the LIKE “condition” is:

where firstname=: ‘Tr’;

This statement would select all first names in Table 2 above. To accomplish the same goal in PROC SQL, the LIKE operator can be used with a trailing % in a where statement.

Conclusion

SAS provides practitioners with several useful techniques using LIKE statements including the smooth LIKE operator/condition in both the DATA step and PROC SQL. There’s definitely reason to like LIKE in SAS programming.

To learn more about SAS Press, check out our up-and-coming titles, and to receive exclusive discounts make sure to subscribe to our newsletter.

References

    Gilsen, Bruce. September 2001. “SAS® Program Efficiency for Beginners.” Proceedings of the Northeast SAS Users Group Conference, Baltimore, MD.

    Roesch, Amanda. September 2011. “Matching Data Using Sounds-Like Operators and SAS® Compare Functions.” Proceedings of the Northeast SAS Users Group Conference, Portland, ME.

    Shankar, Charu. June 2019. “The Shape of SAS® Code.” Proceedings of PharmaSUG 2019 Conference, Philadelphia, PA.

Learn to Love SAS LIKE was published on SAS Users.

1月 212020
 

One great thing about being a SAS programmer is that you never run out of new things to learn. SAS often gives us a variety of methods to produce the same result. One good example of this is the DATA step and PROC SQL, both of which manipulate data. The DATA step is extremely powerful and flexible, but PROC SQL has its advantages too. Until recently, my knowledge of PROC SQL was pretty limited. But for the sixth edition of The Little SAS Book, we decided to move the discussion of PROC SQL from an appendix (who reads appendices?) to the body of the book. This gave me an opportunity to learn more about PROC SQL.

When developing my programs, I often find myself needing to calculate the mean (or sum, or median, or whatever) of a variable, and then merge that result back into my SAS data set. That would generally involve at least a couple PROC steps and a DATA step, but using PROC SQL I can achieve the same result all in one step.

Example

Consider this example using the Cars data set in the SASHELP library. Among other things, the data set contains the 2004 MSRP for over 400 models of cars of various makes and car type. Suppose you want a data set which contains the make, model, type, and MSRP for the model, along with the median MSRP for all cars of the same make. In addition, you would like a variable that is the difference between the MSRP for that model, and the median MSRP for all models of the same make. Here is the PROC SQL code that will create a SAS data set, MedianMSRP, with the desired result:

*Create summary variable for median MSRP by Make;

PROC SQL;
   CREATE TABLE MedianMSRP AS
   SELECT Make, Model, Type, MSRP,
          MEDIAN(MSRP) AS MedMSRP,
          (MSRP - MEDIAN(MSRP)) AS MSRP_VS_Median
   FROM sashelp.cars
   GROUP BY Make;
QUIT;


 

The CREATE TABLE clause simply names the SAS data set to create, while the FROM clause names the SAS data set to read. The SELECT clause lists the variables to keep from the old data set (Make, Model, Type, and MSRP) along with specifications for the new summary variables. The new variable, MedMSRP, is the median of the old MSRP variable, while the new variable MSRP_VS_Median is the MSRP minus the median MSRP. The GROUP BY clause tells SAS to do the calculations within each value of the variable Make. If you leave off the GROUP BY clause, then the calculations would be done over the entire data set. When you run this code, you will get the following message in your SAS log telling you it is doing exactly what you wanted it to do:

NOTE: The query requires remerging summary statistics back with the original data.

The following PROC PRINT produces a report showing just the observations for two makes – Porsche and Jeep.

PROC PRINT DATA = MedianMSRP;
  TITLE '2004 Car Prices';
  WHERE Make IN ('Porsche','Jeep');
  FORMAT MedMSRP MSRP_VS_Median DOLLAR8.0;
RUN;

Results

Here are the results:

Now PROC SQL aficionados will tell you that if all you want is a report and you don’t need to create a SAS data set, then you can do it all in just the PROC SQL step. But that is the topic for another blog!

 

Expand Your SAS Knowledge by Learning PROC SQL was published on SAS Users.

3月 292018
 

With the release of SAS Viya 3.3, you now have the ability to pass implicit SQL queries to a variety of SQL data sources, including Hive. Under an implicit pass-through, users can write SAS compliant SQL code, and SAS will:

  1. Convert as much code as possible into database native SQL.
  2. Execute the resulting query in-database.
  3. Bring the result back into SAS Viya.

My SAS Viya is co-located within a Hortonworks Hadoop environment. Within this environment, I have set up multiple tables within Hive, which provides structure and a query-like environment for Hadoop data. Using the SAS Data Explorer in SAS Viya, I can easily see the different tables in the Hive environment, and visually inspect them without having to load the data into SAS. The screenshot below shows the Hive table va_service_detail, which contains anonymous data related to recent hospital stays.

SQL Pass-through to Hive in SAS Viya

In my Hive environment, I have a second table called va_member_detail, which contains information about the individuals who were hospitalized in the above table, va_service_detail. A summary of this Hive table can be found in the screenshot below.

Using this data, I would like to perform an analysis to determine why patients are readmitted to the hospital, and understand how we can preventatively keep patients healthy. I will need to join these two tables to allow me to have visit-level and patient-level information in one table. Since medical data is large and messy, I would like to only import the needed information into SAS for my analysis.  The simplest way to do this is through an implicit SQL pass-through to Hive, as shown below:

With an implicit pass-through, I write normal SAS FedSQL code on top of a SAS Library called “Hadoop” pointing to my Hive Server. Once the code is submitted, the SAS System performs the following steps:

  1. Translates the SAS FedSQL code into HiveQL.
  2. Executes the HiveQL script in Hive.
  3. Loads the resulting data in parallel into SAS.

Looking at the log, we can see that the SQL statement was “Fully offloaded to the underlying data source via fill pass-through”, meaning that SAS successfully executed the query, in its entirety, in Hive. With the SAS Embedded Process for Hadoop, the resulting table is then lifted in-parallel from Hive into SAS Viya, making it available for analysis.

As we can see in the log, it took 42 seconds to execute the query in Hive, and bring the result into SAS. To compare efficiency, I redid the analysis, loading va_service_detail and va_member_detail into the memory of the SAS server and performed the join there. The execution took 58 seconds, but required three in-memory tables to do so, along with much more data passing through the network. The implicit pass-through has the benefits of increased speed and decreased latency in data transfer by pushing the query to its source, in this case Hive.

Conclusion

The Implicit SQL Pass-through to Hive in SAS Viya is a must have tool for any analyst working with Hadoop data. With normal SQL syntax in a familiar SAS interface, analysts can push down powerful queries into Hive, speeding up their analysis while limiting data transfer. Analysts can effectively work with large ever-growing data sizes, and speed up the time to value on solving key business challenges.

Implicit SQL Pass-through to Hive in SAS Viya was published on SAS Users.

6月 202014
 
This is a simple tutorial showing how to use SQL to subset data, and then create a histogram using Proc Sgplot, in SAS University Edition. So you've downloaded SAS University Edition, and you're wondering "What now?" -- I would recommend exploring some of the sample data, and creating some simple […]
3月 252014
 

The new book Routine SAS SQL, out today, is especially for people getting started with SQL in SAS. When I wrote it, I started from the “SQL” chapter in Professional SAS Programming Logic, but I added many more details that a SAS SQL coder is likely to need.

This book is also my first e-book. I’ll be watching to see how popular the e-book formats are in comparison to the print format.

Links:
Routine SAS SQL at Amazon (paperback)
http://www.amazon.com/gp/product/ASIN/189195721X
Routine SAS SQL at Kobo
http://store.kobobooks.com/en-US/ebook/routine-sas-sql
Routine SAS SQL for Amazon Kindle
http://www.amazon.com/dp/B00J7QKFM6
Routine SAS SQL for Apple iPad
https://itunes.apple.com/us/book/routine-sas-sql/id845242502
11月 132013
 

Wow did the summer fly by! Now that grad school is over, it’s about time to resume the “it’s all about the data” series. In the last several posts, I tried to lay a foundation for understanding how SAS stores and manages data for use in business intelligence and analytic applications. Now, I’d like to turn your attention to third-party database engines, which is the term (or SAS-ism) for when data is housed in non-SAS data structures. Examples include the obvious choices like Microsoft SQL Server, Oracle, MySQL, Teradata and a dozen or so additional supported third party database interfaces.

This article provides a quick overview of how SAS “talks” to third-party databases. If you are an experienced SAS programmer, you probably know much of what’s covered here. This topic has been written about extensively over the years, so while I wont go into detail on every nuance, I thought (thot!) it would be useful to provide a high level overview for those SAS administrators who are not SAS programmers but have to support the data needs of their user community. The focus here is the necessary preconditions for connectivity and the differences between explicitly passing through your query to the database versus letting SAS handle the negotiations. Let’s get started!

I can’t seem to connect to my database from SAS, what’s wrong?
Whether you are on Windows or UNIX, you need to both an ODBC driver manager and an ODBC driver to be installed. The error in the log below is for SQL Server and shows that the connection could not be made. You need to look at your ODBC.ini file (UNIX) or ODBC Administrator (Windows) and confirm that you have specified the details correctly. For more information on how to connect, take a look at the articles here for UNIX and Windows

 

For Oracle, you need an Oracle client driver installed on the server, and the driver needs to know where to look for the database. This driver is usually defined in a file called tnsnames.ora that lives within the client driver directory. For more information, take a look at the articles here for connecting to Oracle databases on UNIX and Windows. Note that you can use either SAS/Access to Oracle (native driver) or SAS/Access to ODBC to make these connections from within SAS.

What is SQL Pass-thru and why should I care?
In SAS, you also have the option of using two features to access your third-party database:  libname access or explicit pass-thru capabilities.

Libname access is just like it sounds – you simply use the LIBNAME statement in SAS and specify the engine along with your credentials and database specific attributes. Once the libref is instantiated, then you can use references to the library just as if it were a SAS library. Here is a quick example:

Explicit pass-thru, on the other hand, is different in that you are not letting SAS do any of the database negotiations for you. Instead, you tell SAS to pass through everything in your program directly to the database for processing. Here is the same connection to sqlsvr, this time using explicit pass-thru.

This is perhaps one of the better-documented SAS capabilities as it relates to best practice. For example, Jessica Hampton discusses SQL pass-thru in her NESUG 2011 paper "SQL Pass-Through and the ODBC Interface". Here, she compares the libname method (implicit) versus explicit pass-thru method. 

There are so many options for connecting SAS to my database, which is the best?
This is a common question. After all, you can use Microsoft SQL Server using ODBC, SAS/Access to Microsoft SQL Server and OLE DB. For Oracle, you also have the option of using the native driver as well as ODBC. You’ll also see that this is true for many other database such as Teradata, mySQL, and so on.  Generally the choice comes down to performance or flexibility:

  • The best performance will always be the native driver for that database.
  • However, if you want the flexibility of talking to lots of databases and don’t mind having to find and purchase individual ODBC drivers, then the more general SAS/Access to ODBC may be right for you.

One word of caution, however:  if you use explicit SQL pass-thru, then you have to comply with the rules of the database driver that you are using. For example, there are lots of capabilities within the Oracle SQL language that are not available in ODBC. So if you want the full power of using database-specific language elements, then you will want to go with the native drivers.

What is the best option for connecting to Microsoft SQL Server?
For Microsoft SQL Server, people often ask if they should use SAS/Access to ODBC or the native SAS/Access to SQL Server engine? One of the best “how-to” articles on SAS and Microsoft SQL Server can be found in the SAS Tech Note "Accessing a Microsoft SQL Server Database from SAS on Microsoft Windows".  Although this note is a bit dated, it provides a good foundation for most users.

What might be surprising to some is that the native engine is actually a Microsoft SQL Server ODBC driver that is shipped with the SAS/Access product. SAS licenses the DataDirect ODBC driver for you so that you don’t have to find and eventually purchase another ODBC driver for your environment.

 The DataDirect drivers have been proven to work efficiently with SAS, and they go through extensive testing before each version of SAS is released. On the other hand, if you want the flexibility of using your own ODBC drivers or have a need to connect to other database via ODBC, then SAS/Access to ODBC might be the route for you.

In the SAS Community, you may see papers that talk about Microsoft SQL Server that use ODBC and some use the native access engine:  sqlsvr. Be sure to use the engine that you have licensed as well as the syntax appropriate to that engine.

There are other methods that can be used to get data in and out of Microsoft SQL Server from SAS such as OLE DB, but I’ll leave that to you to read "Getting Started with SAS/ACCESS Interface to OLE DB" as homework.

 Can I use SAS/Access to ODBC and SAS/Access to Microsoft SQL Server on the same machine?
A slightly more confusing answer comes when people want to use both SAS/Access to ODBC and SAS/Access to Microsoft SQL Server on the same machine. Without going into a detailed explanation, the short answer is “yes – you can, but you might not want to do that.”

Supporting SAS/Access products on the same machine requires that you set up the environment so that the drivers are defined correctly in the odbc.ini file and may necessitate using two startup scripts when launching SAS depending on how you want to install and configure. As long as you are precise about where you install the software and how the environment variables are instantiated, it should work.

Where do I go from here?
Obviously this topic is huge! Since there are already so many good references around this topic, I’ll just highlight a few more:

For those new SAS administrators, are there other questions that you have about accessing data from third party relational database management systems?

Remember--Happy Data, Happy Users!

--greg

tags: database engines, Oracle, SAS Administrators, sql
6月 282013
 

Many SAS Enterprise Guide users practically live in the Query Builder. For those who understand their data tables, the Query Builder provides a tremendous amount of flexibility to pull and manipulate data. The Query Builder produces SQL programs behind the scenes, which translates well for database-centric work.

Sometimes a complex query requires a two-pass process. For example, suppose that you want to create a table of CARS that contains only those models with a higher-than-average price. First you need a query to calculate the average price across all cars. Then you need a second query to filter the records with a price that's higher than that result.

A SAS Enterprise Guide user might build something like this:

This method has obvious downsides: it requires two passes through the data, and two sets of results are brought into your SAS session. If the source table is in a database, this can be especially inefficient.

SAS Enterprise Guide 5.1 supports the idea of subqueries, a concept familiar to SQL programmers. With a subquery (sometimes called "subselect" or "inner query"), you can express the entire operation in one SQL step. That's one trip to the database, and only one set of results brought back to SAS.

I'll briefly describe how this works. For comprehensive documentation and examples about the subquery feature in SAS Enterprise Guide 5.1 and later, read Michael Burke's excellent conference paper on Finding Your Inner Query with SAS Enterprise Guide.

The subquery feature relies on the ability to define query templates (also new in SAS Enterprise Guide 5.1), which are query definitions that you can reuse as building blocks for other query steps. (Query templates are useful on their own, as a way to store a query within your task menus so that you can use it across different projects.)

Using the subquery feature

First, use the Query Builder to define the "inner query". In my example, that query contains just one item, the calculated average of MSRP.

Instead of running the query, use the drop-down arrow on the Run button to select Create Template. The Create Template window allows you to assign a name to this definition, and it's automatically added to your Tasks->Task Templates menu. Note that you can create the query template without even running this part of the query. That can be a time saver if you're a Query Builder Ninja with high confidence in your result.

Next, design the outer query with a new Query Builder task. Select all of the fields that you want to include in the result. In the Filter tab, add a new filter for MSRP. Instead of comparing MSRP to a constant or other expression, you'll select from a list of available Subquery templates.

In your Filter tab, you'll see a partial view of the inner SELECT expression.

When you complete your selections and have the other query options just the way that you want them, click Run to execute the query.

If you examine the SAS program that the task creates, you'll see the inner SELECT clause:

PROC SQL;
   CREATE TABLE WORK.expensive AS 
   SELECT t1.Make, 
          t1.Model, 
          t1.MSRP, 
      FROM SASHELP.CARS t1
      WHERE t1.MSRP >= (
         SELECT /* AVG_of_MSRP */
           (AVG(t1.MSRP)) AS AVG_of_MSRP
             FROM SASHELP.CARS t1
      );
QUIT;

You can also use subqueries as a way to filter/compare a column to a list of values. In that case, your inner query simply needs to have a single-column list of results (instead of a single number, as in my example). Read Michael's subquery paper for all of the possibilities and limitations. Here's an executive summary from Michael's paper:

Here are the types of subqueries that you can create using query-based templates:

  • Subqueries that return a single value
  • Subqueries that return multiple values (multiple rows of a single field)
  • Subqueries that appear as part of a filter of the raw data (on the WHERE clause)
  • Subqueries that appear as part of a filter on the grouped data (on the HAVING clause)
  • Subqueries that appear as part of a recode condition as part of recoding a column in a computed column (on the SELECT clause)

Here are the types of subqueries that you cannot create using query-based templates:

  • Subqueries that form a derived table (subqueries that appear on the FROM clause)
  • Subqueries that refer to columns on the outer query (correlated subqueries)

More than one way to skin this data

There are many other ways to accomplish this simple example in SAS. For example, you may be able to use the HAVING clause feature in the Query Builder, or use SQL joins and special join conditions to filter a table based on another result set. But the ability to create a subquery has been long-requested from SAS Enterprise Guide users who understand databases. I hope you'll agree that it's an important addition to your query and reporting toolbox.

tags: SAS Enterprise Guide, sql, subquery