SAS programmers

7月 102019
 

Posted on behalf of SAS Press author Derek Morgan.


I was sitting in a model railroad club meeting when one of our more enthusiastic young members said, "Wouldn't it be cool if we could make a computer simulation, with trains going between stations and all. We could have cars and engines assigned to each train and timetables and…"

So, I thought to myself, “Timetables… I bet SAS can do that easily… sounds like something fun for Mr. Dates and Times."

As it turns out, the only easy part of creating a timetable is calculating the time. SAS handles the concept of elapsed time smoothly. It’s still addition and subtraction, which is the basis of how dates and times work in SAS. If a train starts at 6:00 PM (64,800 seconds of the day) and arrives at its destination 12 hours (43,200 seconds) later, it arrives at 6:00 AM the next day. The math is start time+duration=end time (108,000 seconds,) which is 6:00 AM the next day. It doesn’t matter which day, that train is always scheduled to arrive at 6:00 AM, 12 hours from when it left.

It got a lot more complicated when the group grabbed onto the idea. One of the things they wanted to do was to add or delete trains and adjust the timing so multiple trains don’t run over the same track at the same time. This wouldn’t be that difficult in SAS; just create an interactive application, but… I’m the only one who has SAS. So how do I communicate my SAS database with the “outside world”? The answer was Microsoft Excel, and this is where it gets thorny.

It’s easy enough to send SAS data to Excel using ODS EXCEL and PROC REPORT, but how could I get Excel to allow the club to manipulate the data I sent?
I used the COMPUTE block in PROC REPORT to display a formula for every visible train column. I duplicated the original columns (with corresponding names to keep it all straight) and hid them in the same spreadsheet. The EXCEL formula code is in line 8.

Compute Block Code:

I also added three rows to the dataset at the top. The first contains the original start time for each train, the second contains an offset, which is always zero in the beginning, while the third row was blank (and formatted with a black background) to separate it from the actual schedule.


Figure 1: Schedule Adjustment File

The users can change the offset to change the starting time of a train (Column C, Figure 2.) The formula in the visible columns adds the offset to the value in each cell of the corresponding hidden column (as long as it isn’t blank.) You can’t simply add the offset to the value of the visible cell, because that would be a circular reference.

The next problem was moving a train to an earlier starting time, because Excel has no concept of negative time (or date—a date prior to the Excel reference date of January 1, 1900 will be a character value in Excel and cause your entire column to be imported into SAS as character data.) Similarly, you can’t enter -1:00 as an offset to move the starting time of our 5:35 AM train to 4:35 AM. Excel will translate “-1:00” as a character value and that will cause a calculation error in Excel. In order to move that train to 4:35 AM, you have to add 23 hours to the original starting time (Column D, Figure 2.)


Figure 2: Adjusting Train Schedules

After the users adjust the schedules, it’s time to return our Excel data to SAS, which creates more challenges. In the screenshot above, T534_LOC is the identifier of a specific train, and the timetable is kept in SAS time values. Unfortunately, PROC IMPORT using DBMS=XLSX brings the train columns into SAS as character data. T534_LOC also imports as the actual Excel value, time as a fraction of a day.


Figure 3: How the Schedule Adjustment File Imports to SAS

While I can fix that by converting the character data to numeric and multiplying by 86,400, I still need the original column name of T534_LOC for the simulation, so I would have to rename each character column and output the converted data to the original column name. There are currently 146 trains spread across 12 files, and that is a lot of work for something that was supposed to be easy! Needless to say, this “little” side project, like most model railroads, is still in progress. However, this exercise in moving time data between Microsoft Excel and SAS gave me even more appreciation for the way SAS handles date and time data.

Figure 4 is a partial sample of the finished timetable file, generated as an RTF file using SAS. The data for trains 534 and 536 are from the spreadsheet in Figure 1.


Figure 4: Partial Sample Timetable

Want to learn more about how to use and manipulate dates, times, and datetimes in SAS? You'll find the answers to these questions and much more in my book The Essential Guide to SAS Dates and Times, Second Edition. Updated for SAS 9.4, with additional functions, formats, and capabilities, the Second Edition has a new chapter dedicated to the ISO 8601 standard and the formats and functions that are new to SAS, including how SAS works with Universal Coordinated Time (UTC). Chapter 1 is available as a free preview here.

For updates on new SAS Press books and great discounts subscribe to the SAS Press New Book Newsletter.

SAS Press Author Derek Morgan on Timetables and Model Trains was published on SAS Users.

7月 032019
 

One of my favorite parts of summer is a relaxing weekend by the pool. Summer is the time I get to finally catch up on my reading list, which has been building over the year. So, if expanding your knowledge is a goal of yours this summer, SAS Press has a shelf full of new titles for you to explore. To help navigate your selection we asked some of our authors what SAS books were on their reading lists for this summer?

Teresa Jade


Teresa Jade, co-author of SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, has already started The DS2 Procedure: SAS Programming Methods at Work by Peter Eberhardt. Teresa reports that the book “is a concise, well-written book with good examples. If you know a little bit about the SAS DATA step, then you can leverage what you know to more quickly get up to speed with DS2 and understand the differences and benefits.”
 
 
 

Derek Morgan

Derek Morgan, author of The Essential Guide to SAS® Dates and Times, Second Edition, tells us his go-to books this summer are Art Carpenter’s Complete Guide to the SAS® REPORT Procedure and Kirk Lafler's PROC SQL: Beyond the Basics Using SAS®, Third Edition. He also notes that he “learned how to use hash objects from Don Henderson’s Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study.”
 

Chris Holland

Chris Holland co-author of Implementing CDISC Using SAS®: An End-to-End Guide, Revised Second Edition, recommends Richard Zink’s JMP and SAS book, Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP® and SAS®, which describes how to improve efficiency while reducing costs in trials with centralized monitoring techniques.
 
 
 
 
 

And our recommendations this summer?

Download our two new free e-books which illustrate the features and capabilities of SAS® Viya®, and SAS® Visual Analytics on SAS® Viya®.

Want to be notified when new books become available? Sign up to receive information about new books delivered right to your inbox.

Summer reading – Book recommendations from SAS Press authors was published on SAS Users.

6月 262019
 

In his article How to use CASL to develop and work with user-defined CAS actions, Brian Kinnebrew defines CASL as "a language specification used by the SAS client to interact with and provide easy access to Cloud Analytic Services (CAS). CASL is a statement-based scripting language with many uses and strengths." I can't come up with a better definition, so if you'd like to learn more about the basics of CASL, I encourage you to read Brian's post.

In a SAS Stored Process (or any traditional SAS program) the code has multiple DATA steps and procedures with a good dose of macros.

Over the last couple of years, the focus of my projects is the use of CAS actions with no traditional procedures in the mix. Most of these involved web applications calling multiple CAS actions. My initial approach was to make multiple http calls - one per action. This could get tedious.

Then I met my action hero: sccasl.runCasl.

My favorite SAS CASL action

This action executes a CASL "script" on the CAS server analogous to executing a SAS Stored Process. Running a CASL program with a mix of CAS actions and CASL statements on the CAS server has these benefits:

  1. Reduced the number of http calls to the server
  2. The client-side code is much easier to reason
  3. The returned values can be a dictionary that is suited for further consumption by the client, simplifying the client code

My personal name for these scripts is "CAS stored process".

Where art thou Macro?

In many applications, user input is passed to the code running on the server. In a SAS Stored Process, macros pass the parameters. CASL has no macros. My initial approaches to passing parameters to CASL programs were:

  1. Generate the final CASL program in JavaScript with the user input values inserted into the code. Sample code is available here.
    • Drawback: Debugging the code in SAS Studio requires a cut-and-paste of the generated code into SAS Studio.
  2. Load the data into a CAS table using table.upload action for programs needing an input table. Sample code is available here.
    • Drawback: This requires an additional http call to the server.

In developing the GraphQL approach to writing applications - GraphQL and SAS Viya applications - a good match - I addressed the two drawbacks listed above by creating two functions and put them in my utility belt.

    Superfriend functions

  • jsonToDict.js - generate a string having the CASL dictionary version of a JavaScript object
  • argsToTable - create a CAS table from a dictionary

The remainder of this article discusses these two functions. To demonstrate the functions' usage, I use code listings from the scoring example, covered in the GraphQL example.

The jsonToDict function

The result of this function produces a string with CASL dictionary suitable for inclusion in CASL code.

Function definition

jsonToDict ⇒ string

Returnsstring - returns the string containing the CASL dictionary

Param Type Description
obj object the JavScript object of interest
name string the name to assign to the dictionary

 

Example function code

The code below outlines the jsonToDict function usage.

obj = {x:1, b:2, c:['a', 'b']};
let r = jsonToDict(obj, '_appEnv_');
The result is:
r = `_appEnv_ = {x=1, b=2, c={"a", "b"}}'`;

The following lists the input parameters passed to the CASL code for scoring.

let input = {
        JOB    : 'J1',
        CLAGE  : 100, 
        CLNO   : 20, 
        DEBTINC: 20, 
        DELINQ : 2, 
        DEROG  : 0, 
        MORTDUE: 4000, 
        NINQ   : 1,
        YOJ    : 10,
        LOAN: 1000,
        ASSET: 100000
    };

Below is the Javascript code and the result.

let _args_ = jsonToDict(input, '_args_');
results in
 
args_ = `_args_ = {  JOB= "J1" ,CLAGE=100  ,CLNO=20  ,DEBTINC=20  ,DELINQ=2  ,DEROG=0  ,
MORTDUE=4000  ,NINQ=1  ,YOJ=10  ,LOAN=1000  ,ASSET=100000  };`;

To allow the use of different versions of the model, the name of the scoring model is passed in as a parameter. The Javascript for the model follows.

let env = {
     astore: {
            caslib: 'Public',
            name  : 'GRADIENT_BOOSTING___BAD_2'
        }
};
let _appEnv_= jsonToDict(env, '_appEnv_');
 
resulting in:
let _appEnv_ = `_appEnv_ = { astore = { caslib="Public", name="GRADIENT_BOOSTING___BAD_2"}};`;

Next, I prepend the strings to the CASL program in the client code with the following code snips.

let code = _args_ + _appEnv_ + `the CASL code shown below';
loadactionset "astore";
 
/* convert arguments to a cas table */
argsToTable(_args_, 'casuser', 'INPUTDATA' );
 
/* score */
action astore.score r=rc/
    table  = { caslib= 'casuser', name = 'INPUTDATA' } 
    rstore = { caslib= _appEnv_.astore.caslib,  name=_appEnv_.astore.name }
    casout  = { caslib = 'casuser', name = 'OUTPUTDATA' replace= TRUE};
 
/* fetch results */
action table.fetch r = result /
    table = {  caslib = 'casuser' name = 'OUTPUTDATA' } ;
 
/* extract the score and send it as a dictionary */
score = result.Fetch[1].P_BAD;
send_response({score = score});

Now the CASL program can access the incoming information using the two dictionaries _args_ and _appEnv_. Note: As a personal choice, I use a convention of _args_ for user input and _appEnv_ for application specific information. I use the restaf application framework to make the http calls as shown below.

let payload = {
        action: 'sccasl.runCasl',
        data  : { code: code}
    }
 let result = await store.runAction(session, payload); 
let score = result.items('results', 'score');

CASL coding - easier than saying Kilp-ill-skim

Notice the absence of string substitutions that look strange. Just simple, straight forward coding. Easier than saying your name backwards, forcing you back to the fifth dimension.

The argsToTable.casl function

The sample code above used the function argsToTable. As you may guess it converts the _args_ dictionary into a CAS table used in the scoring action. The argsToTable is the function in CASL handling this task.

Function definition

argsToTable ⇒ Load dictionary into a CAS Table

Param Type Description
input dictionary the data to load
caslib string caslib of output table
name string name of output table

 

The relevant CASL code from Step 1 is reproduced here:

argsToTable(_args_, 'casuser', 'INPUTDATA' );

The argsToTable function is either stored on a server or prepended to the CASL code sent to the runCasl action. This function removes the need to run a http call to load the data in the CAS table.

Returning data from CAS - the send_response function

The ultimate sidekick function

Any good super hero has a sidekick. The function send_response in CASL is very versatile - it allows one to return data in the form the application needs and allows more than one result. In many programs I return data in a form easily consumed by the client code.

For example, if you wanted to return just the rows of table you can do the following:

function resultsToDict(r);
    casResults = {};
    i = 1;
  do row over r;
     casResults[i] = row;
     i = i + 1;
   end;
  return casResults;
end;
 
/* and use it as follows: */
 
action table.fetch r = result /
    table = {  caslib = 'casuser',  name = 'mydata' };
/* extract the data and return it as a dictionary */
casResults = resultsToDict(result.Fetch);
send_response({casResults: casResults});

Finally

Using a combination of CASL, a couple of utility functions and the runCasl action you can develop some very efficient programs with minimal traffic between your client and the server. If you run multiple actions in sequence you should consider grouping them into a CASL program and executing them on the CAS server using the runCasl action.

Next

In my next article, which I hope to finish in a flash, I will discuss using the runCasl action to create a browser for CAS tables with support for pagination.

All comments are welcome. Please feel free to clone the code, make it better.

Cheers...
Deva

Let runCasl be your BFF and favorite action hero

"CAS Stored Process" with my Favorite Action Hero runCasl was published on SAS Users.

6月 182019
 

What is Item Response Theory?

Item Response Theory (IRT) is a way to analyze responses to tests or questionnaires with the goal of improving measurement accuracy and reliability.

A common application is in testing a student’s ability or knowledge. Today, all major psychological and educational tests are built using IRT. The methodology can significantly improve measurement accuracy and reliability while providing potential significant reductions in assessment time and effort, especially via computerized adaptive testing. For example, the SAT and GRE both use Item Response Theory for their tests. IRT takes into account the number of questions answered correctly and the difficulty of the question.

In recent years, IRT models have also become increasingly popular in health behavior, quality of life, and clinical research. There are many different models for IRT. Three of the most popular are:

The Rasch model

Two-parameter model

Graded Response model

Early IRT models (such as the Rasch model and two-parameter model) concentrate mainly on dichotomous responses. These models were later extended to incorporate other formats, such as ordinal responses, rating scales, partial credit scoring, and multiple category scoring.

Item Response Theory Models Using SAS

Ron Cody and Jeffrey K. Smith’s book, Test Scoring and Analysis Using SAS, uses SAS PROC IRT to show how to develop your own multiple-choice tests, score students, produce student rosters (in print form or Excel), and explore item response theory (IRT).

Aimed at non-statisticians working in education or training, the book describes item analysis and test reliability in easy-to-understand terms and teaches SAS programming to score tests, perform item analysis, and estimate reliability.

For those with a more statistical background, Bayesian Analysis of Item Response Theory Models Using SAS describes how to estimate and check IRT models using the SAS MCMC procedure. Written especially for psychometricians, scale developers, and practitioners, numerous programs are provided and annotated so that you can easily modify them for your applications.

Assessment has played, and continues to play, an integral part in our work and educational settings. IRT models continue to be increasingly popular in many other fields, such as medical research, health sciences, quality-of-life research, and even marketing research. With the use of IRT models, you can not only improve scoring accuracy but also economize test administration by adaptively using only the discriminative items.

Interested in learning more? Check out our chapter previews available for free. Want to learn more about SAS Press? Explore our online bookstore and subscribe to our newsletter to get all the latest discounts, news, and more.

Further resources

SAS Blogs:
New at SAS: Psychometric Testing by Charu Shankar
SAS author’s tip: Bayesian analysis of item response theory models

SAS Communities:
SAS Communities: Custom Task Tuesday: SAS Global Forum/PROC IRT Edition!

SAS Global Forum Paper:
Item Response Theory: What It Is and How You Can Use the IRTProcedure to Apply It by Xinming An and Yiu-Fai Yung

SAS Documentation:
The IRT Procedure
SAS/STAT 14.1 User Guide: The IRT Procedure
SAS/STAT 14.2 User Guide: Help Center

Understanding Item Response Theory with SAS was published on SAS Users.

6月 062019
 

Want to learn SAS programming but worried about taking the plunge? Over at SAS Press, we are excited about an upcoming publication that introduces newbies to SAS in a peer-review instruction format we have found popular for the classroom. Professors Jim Blum and Jonathan Duggins have written Fundamentals of Programming in SAS using a spiral curriculum that builds upon topics introduced earlier and at its core, uses large-scale projects presented as case studies. To get ready for release, we interviewed our new authors on how their title will enhance our SAS Books collection and which of the existing SAS titles has had an impact on their lives!

What does your book bring to the SAS collection? Why is it needed?

Blum & Duggins: The book is probably unique in the sense that it is designed to serve as a classroom textbook, though it can also be used as a self-study guide. That also points to why we feel it is needed; there is no book designed for what we (and others) do in the classroom. As SAS programming is a broad topic, the goal of this text is to give a complete introduction of effective programming in Base SAS – covering topics such as working with data in various forms and data manipulation, creating a variety of tabular and visual summaries of data, and data validation and good programming practices.

The book pursues these learning objectives using large-scale projects presented as case studies. The intent of coupling the case-study approach with the introductory programming topics is to create a path for a SAS programming neophyte to evolve into an adept programmer by showing them how programmers use SAS, in practice, in a variety of contexts. The reader will gain the ability to author original code, debug pre-existing code, and evaluate the relative efficiencies of various approaches to the same problem using methods and examples supported by pedagogical theory. This makes the text an excellent companion to any SAS programming course.

What is your intended audience for the book?

Blum & Duggins: This text is intended for use in both undergraduate and graduate courses, without the need for previous exposure to SAS. However, we expect the book to be useful for anyone with an aptitude for programming and a desire to work with data, as a self-study guide to work through on their own. This includes individuals looking to learn SAS from scratch or experienced SAS programmers looking to brush up on some of the fundamental concepts. Very minimal statistical knowledge, such as elementary summary statistics (e.g. means and medians), would be needed. Additional knowledge (e.g. hypothesis testing or confidence intervals) could be beneficial but is not expected.

What SAS book(s) changed your life? How? And why?

Blum: I don’t know if this qualifies, but the SAS Programming I and SAS Programming II course notes fit this best. With those, and the courses, I actually became a SAS programmer instead of someone who just dabbled (and dabbled ineffectively). From there, many doors were opened for me professionally and, more importantly, I was able to start passing that knowledge along to students and open some doors for them. That experience also served as the basis for building future knowledge and passing it along, as well.

Duggins: I think the two SAS books that most changed my outlook on programming (which I guess has become most of my life, for better or worse) would either be The Essential PROC SQL Handbook for SAS Users by Katherine Prairie or Jane Eslinger's The SAS Programmer's PROC REPORT Handbook, because I read them at different times in my SAS programming career. Katherine's SQL book changed my outlook on programming because, until then, I had never done enough SQL to consistently consider it as a viable alternative to the DATA step. I had taken a course that taught a fair amount of SQL, but since I had much more experience with the DATA step and since that is what was emphasized in my doctoral program, I didn't use SQL all that often. However, after working through her book, I definitely added SQL to my programming arsenal. I think learning it, and then having to constantly evaluate whether the DATA step or SQL was better suited to my task, made me a better all-around programmer.

As for Jane's book - I read it much later after having used some PROC REPORT during my time as a biostatistician, but I really wasn't aware of how much could be done with it. I've also had the good fortune to meet Jane, and I think her personality comes through clearly - which makes that book even more enjoyable now than it was during my first read!

Read more

We at SAS Press are really excited to add this new release to our collection and will continue releasing teasers until its publication. For almost 30 years SAS Press has published books by SAS users for SAS users. Here is a free excerpt located on our Duggins' author page to whet your appetite. (Please know that this excerpt is an unedited draft and not the final content). Look out for news on this new publication, you will not want to miss it!

Want to find out more about SAS Press? For more about our books, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also, check out all our new SAS books at our online bookstore.

Interview with new SAS Press authors: Jim Blum and Jonathan Duggins was published on SAS Users.

6月 042019
 


Two sayings I’ve heard countless times throughout my life are “Work smarter, not harder,” and “Use the best tool for the job.” If you need to drive a nail, you pick up a hammer, not a wrench or a screwdriver. In the programming world, this could mean using an existing function library instead of writing your own or using an entirely different language because it’s more applicable to your problem. While that sounds good in practice, in the workplace you don’t always have that freedom.

So, what do you do when you’re given a hammer and told to fasten a screw? Or, like in the title of this article’s case, what do you do when you have Python functions you want to use in SAS?

Recently I was tasked with documenting an exciting new feature for SAS — the ability to call Python functions from within SAS. In this article I will highlight everything I’ve learned along the way to bring you up to speed on this powerful new tool.

PROC FCMP Python Objects

Starting with May 2019 release of SAS 9.4M6, the PROC FCMP procedure added support for submitting and executing functions written in Python from within a SAS session using the new Python object. If you’re unfamiliar with PROC FCMP, I’d suggest reading the documentation. In short, FCMP, or the SAS Function Compiler, enables users to write their own functions and subroutines that can then be called from just about anywhere a SAS function can be used in SAS. Users are not restricted to using Python only inside a PROC FCMP statement. You can create an FCMP function that calls Python code, and then call that FCMP function from the DATA step. You can also use one of the products or solutions that support Python objects including SAS High Performance Risk and SAS Model Implementation Platform.

The Why and How

So, what made SAS want to include this feature in our product? The scenario in mind we imagined when creating this feature was a customer who already had resources invested in Python modeling libraries but now wanted to integrate those libraries into their SAS environment. As much fun as it sounds to convert and validate thousands of lines of Python code into SAS code, wouldn’t it be nice if you could simply call Python functions from SAS? Whether you’re in the scenario above with massive amounts of Python code, or you’re simply more comfortable coding in Python, PROC FCMP is here to help you. Your Python code is submitted to a Python interpreter of your choice. Results are packaged into a Python tuple and brought back inside SAS for you to continue programming.

Programming in Two Languages at Once

So how do you program in SAS and Python at the same time? Depending on your installation of SAS, you may be ready to start, or there could be some additional environment setup you need to complete first. In either case, I recommend pulling up the Using PROC FCMP Python Objects documentation before we continue. The documentation outlines the addition of an output string that must be made to your Python code before it can be submitted from SAS. When you call a Python function from SAS, the return value(s) is stored in a SAS dictionary. If you’re unfamiliar with SAS dictionaries, you can read more about them here Dictionaries: Referencing a New PROC FCMP Data Type.

Getting Started

There are multiple methods to load your Python code into the Python object. In the code example below, I’ll use the SUBMIT INTO statement to create an embedded Python block and show you the basic framework needed to execute Python functions in SAS.

/* A basic example of using PROC FCMP to execute a Python function */
proc fcmp;
 
/* Declare Python object */
declare object py(python);
 
/* Create an embedded Python block to write your Python function */
submit into py;
def MyPythonFunction(arg1, arg2):
	"Output: ResultKey"
	Python_Out = arg1 * arg2
	return Python_Out
endsubmit;
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
rc = py.call("MyPythonFunction", 5, 10);
 
/* Store the result in a SAS variable and examine the value */
SAS_Out = py.results["ResultKey"];
put SAS_Out=;
run;

You can gather from this example that there are essentially five parts to using PROC FCMP Python objects in SAS:

  1. Declaring your Python object.
  2. Loading your Python code.
  3. Publishing your Python code to the interpreter.
  4. Executing your Python Code.
  5. Retrieving your results in SAS.

From the SAS side, those are all the pieces you need to get started importing your Python code. Now what about more complicated functions? What if you have working models made using thousands of lines and a variety of Python packages? You still use the same program structure as before. This time I’ll be using the INFILE method to import my Python function library by specifying the file path to the library. You can follow along in by copying my Python code into a .py file. The file, blackscholes.py, contains this code:

def internal_black_scholes_call(stockPrice, strikePrice, timeRemaining, volatility, rate):
    import numpy
    from scipy import stats
    import math
    if ((strikePrice != 0) and (volatility != 0)):
        d1 = (math.log(stockPrice/strikePrice) + (rate + (volatility**2)\
                       /  2) * timeRemaining) / (volatility*math.sqrt(timeRemaining))
        d2 = d1 - (volatility * math.sqrt(timeRemaining))
        callPrice = (stockPrice * stats.norm.cdf(d1)) - \
        (strikePrice * math.exp( (-rate) * timeRemaining) * stats.norm.cdf(d2))
    else:
        callPrice=0
    return (callPrice)
 
def black_scholes_call(stockPrice, strikePrice, timeRemaining, volatility, rate):
    "Output: optprice"
    import numpy
    from scipy import stats
    import math
    optPrice = internal_black_scholes_call(stockPrice, strikePrice,\
                                           timeRemaining, volatility, rate)
    callPrice = float(optPrice)
    return (callPrice,)

My example isn’t quite 1000 lines, but you can see the potential of having complex functions all callable inside SAS. In the next figure, I’ll call these Python functions from SAS.

(/*Using PROC FCMP to execute Python functions from a file */
proc fcmp;
 
/* Declare Python object */
declare object py(python);
 
/* Use the INFILE method to import Python code from a file */
rc = py.infile("C:\Users\PythonFiles\blackscholes.py");
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
rc = py.call("black_scholes_call", 132.58, 137, 0.041095, .2882, .0222);
 
/* Store the result in a SAS variable and examine the value */
SAS_Out = py.results["optprice"];
put SAS_Out=;
run;

Calling Python Functions from the DATA step

You can take this a step further and make it useable in the DATA step-outside of a PROC FCMP statement. We can use our program from the previous example as a starting point. From there, we just need to wrap the inner Python function call in an outer FCMP function. This function within a function design may be giving you flashbacks of Inception, but I promise you this exercise won’t leave you confused and questioning reality. Even if you’ve never used FCMP before, creating the outer function is straightforward.

/* Creating a PROC FCMP function that calls a Python function  */
proc fcmp outlib=work.myfuncs.pyfuncs;
 
/* Create the outer FCMP function */
/* These arguments are passed to the inner Python function */
function FCMP_blackscholescall(stockprice, strikeprice, timeremaining, volatility, rate);
 
/* Create the inner Python function call */
/* Declare Python object */
declare object py(python);
 
/* Use the INFILE method to import Python code from a file */
rc = py.infile("C:\Users\PythonFiles\blackscholes.py");
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
/* Since this the inner function, instead of values in the call           */
/* you will pass the outer FCMP function arguments to the Python function */
rc = py.call("black_scholes_call", stockprice, strikeprice, timeremaining, volatility, rate);
 
/* Store the inner function Python output in a SAS variable                              */
FCMP_out = py.results["optprice"];
 
/* Return the Python output as the output for outer FCMP function                        */
return(FCMP_out);
 
/* End the FCMP function                                                                 */
endsub;
run;
 
/* Specify the function library you want to call from                                    */
options cmplib=work.myfuncs;
 
/*Use the DATA step to call your FCMP function and examine the result                    */
data _null_;
   result = FCMP_blackscholescall(132.58, 137, 0.041095, .2882, .0222);
   put result=;
run;

With your Python function neatly tucked away inside your FCMP function, you can call it from the DATA step. You also effectively reduced the statements needed for future calls to the Python function from five to one by having an FCMP function ready to call.

Looking Forward

So now that you can use Python functions in SAS just like SAS functions, how are you going to explore using these two languages together? The PROC FCMP Python object expands the capabilities of SAS and by result improves you too as a SAS user. Depending on your experience level, completing a task in Python might be easier for you than completing that same task in SAS. Or you could be in the scenario I mentioned before where you have a major investment in Python and converting to SAS is non-trivial. In either case, PROC FCMP now has the capability to help you bridge that gap.

SAS or Python? Why not use both? Using Python functions inside SAS programs was published on SAS Users.

4月 032019
 

Structuring a highly unstructured data source

Human language is astoundingly complex and diverse. We express ourselves in infinite ways. It can be very difficult to model and extract meaning from both written and spoken language. Usually the most meaningful analysis uses a number of techniques.

While supervised and unsupervised learning, and specifically deep learning, are widely used for modeling human language, there’s also a need for syntactic and semantic understanding and domain expertise. Natural Language Processing (NLP) is important because it can help to resolve ambiguity and add useful numeric structure to the data for many downstream applications, such as speech recognition or text analytics. Machine learning runs outputs from NLP through data mining and machine learning algorithms to automatically extract key features and relational concepts. Human input from linguistic rules adds to the process, enabling contextual comprehension.

Text analytics provides structure to unstructured data so it can be easily analyzed. In this blog, I would like to focus on two widely used text analytics techniques: information extraction and entity resolution.

Information Extraction

Information Extraction (IE) automatically extracts structured information from an unstructured or semi-structured text data type -- for example, a text file, to create new structured text data. IE works at the sub-document level, in contrast with techniques such as categorization, that work at the document or record level. Therefore, the results of IE can further feed into other analyses, like predictive modeling or topic identification, as features for those processes. IE can also be used to create a new database of information. One example is the recording of key information about terrorist attacks from a group of news articles on terrorism. Any given IE task has a defined template, which is a (or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to “understand” an attack article only enough to find data corresponding to the slots in this template. Such a database can then be used and analyzed through queries and reports about the data.

In their new book, SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, authors Teresa Jade, Biljana Belamaric Wilsey, and Michael Wallis, give some great examples of uses of IE:

"One good use case for IE is for creating a faceted search system. Faceted search allows users to narrow down search results by classifying results by using multiple dimensions, called facets, simultaneously. For example, faceted search may be used when analysts try to determine why and where immigrants may perish. The analysts might want to correlate geographical information with information that describes the causes of the deaths in order to determine what actions to take."

Another good example of using IE in predictive models is analysts at a bank who want to determine why customers close their accounts. They have an active churn model that works fairly well at identifying potential churn, but less well at determining what causes the churn. An IE model could be built to identify different bank policies and offerings, and then track mentions of each during any customer interaction. If a particular policy could be linked to certain churn behavior, then the policy could be modified to reduce the number of lost customers.

Reporting information found as a result of IE can provide deeper insight into trends and uncover details that were buried in the unstructured data. An example of this is an analysis of call center notes at an appliance manufacturing company. The results of IE show a pattern of customer-initiated calls about repairs and breakdowns of a type of refrigerator, and the results highlight particular problems with the doors. This information shows up as a pattern of increasing calls. Because the content of the calls is being analyzed, the company can return to its design team, which can find and remedy the root problem.

Entity Resolution and regular expressions

Entity Resolution is the technique of recognizing when two observations relate to the same entity (thing, person, company), despite having been described differently. And conversely, recognizing when two observations do not relate to the same entity, despite having been described similarly. For example, you are listed in one data base as S Roberts, Sian Roberts, S.Roberts. All refer to the same person but would be treated as different people in an analysis unless they are resolved (combined to one person).

Entity resolution can be performed as part of a data pre-processing step or as text analysis. Basically one helps resolve multiple entries (cleans the data) and the other resolves reference to a single entity to extract meaning, for example, pronoun resolution - when “it” refers to a particular company mentioned earlier in the text. Here is another example:

Assume each numbered item is a separate observation in the input data set:
1. SAS Institute is a great company. Our company has a recreation center and health care center for employees.
2. Our company has won many awards.
3. SAS Institute was founded in 1976.

The scoring output matches are below; note that the document ID associated with each match aligns with the number before the input document where the match was found.

Unstructured data clean-up

In the following section we focus on the pre-processing clean-up of the data. Unstructured data is the most voluminous form of data in the world, and analysts rarely receive it in perfect condition for processing. In other words, textual data needs to be cleaned, transformed, and enhanced before value can be derived from it.

A regular expression is a pattern that the regular expression engine attempts to match in input. In SAS programming, regular expressions are seen as strings of letters and special characters that are recognized by certain built-in SAS functions for the purpose of searching and matching. Combined with other built-in SAS functions and procedures, such as entity resolution, you can realize tremendous capabilities. Matthew Windham, author of Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS®, gives some great examples of how you might use these techniques to clean your text data in his book. Here we share one of them:

"As you are probably familiar with, data is rarely provided to analysts in a form that is immediately useful. It is frequently necessary to clean, transform, and enhance source data before it can be used—especially textual data."

Extract, Transform, and Load (ETL) ETL is a general set of processes for extracting data from its source, modifying it to fit your end needs, and loading it into a target location that enables you to best use it (e.g., database, data store, data warehouse). We’re going to begin with a fairly basic example to get us started. Suppose we already have a SAS data set of customer addresses that contains some data quality issues. The method of recording the data is unknown to us, but visual inspection has revealed numerous occurrences of duplicative records. In this example, it is clearly the same individual with slightly different representations of the address and encoding for gender. But how do we fix such problems automatically for all of the records?

First Name Last Name DOB Gender Street City State Zip Robert Smith 2/5/1967 M 123 Fourth Street Fairfax, VA 22030 Robert Smith 2/5/1967 Male 123 Fourth St. Fairfax va 22030

Using regular expressions, we can algorithmically standardize abbreviations, remove punctuation, and do much more to ensure that each record is directly comparable. In this case, regular expressions enable us to perform more effective record keeping, which ultimately impacts downstream analysis and reporting. We can easily leverage regular expressions to ensure that each record adheres to institutional standards. We can make each occurrence of Gender either “M/F” or “Male/Female,” make every instance of the Street variable use “Street” or “St.” in the address line, make each City variable include or exclude the comma, and abbreviate State as either all caps or all lowercase. This example is quite simple, but it reveals the power of applying some basic data standardization techniques to data sets. By enforcing these standards across the entire data set, we are then able to properly identify duplicative references within the data set. In addition to making our analysis and reporting less error-prone, we can reduce data storage space and duplicative business activities associated with each record (for example, fewer customer catalogs will be mailed out, thus saving money).

Your unstructured text data is growing daily, and data without analytics is opportunity yet to be realized. Discover the value in your data with text analytics capabilities from SAS. The SAS Platform fosters collaboration by providing a toolbox where best practice pipelines and methods can be shared. SAS also seamlessly integrates with existing systems and open source technology.

Further Resources:
Natural Language Processing: What it is and why it matters

White paper: Text Analytics for Executives: What Can Text Analytics Do for Your Organization?

SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, by Teresa Jade, Biljana Belamaric Wilsey, and Michael Wallis

Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS®, by Matthew Windham

Text analytics explained was published on SAS Users.

4月 012019
 

dividing by zero with SAS

Whether you are a strong believer in the power of dividing by zero, agnostic, undecided, a supporter, denier or anything in between and beyond, this blog post will bring all to a common denominator.

History of injustice

For how many years have you been told that you cannot divide by zero, that dividing by zero is not possible, not allowed, prohibited? Let me guess: it’s your age minus 7 (± 2).

But have you ever been bothered by that unfair restriction? Think about it: all other numbers get to be divisors. All of them, including positive, negative, rational, even irrational and imaginary. Why such an injustice and inequality before the Law of Math?

We have our favorites like π, and prime members (I mean numbers), but zero is the bottom of the barrel, the lowest of the low, a pariah, an outcast, an untouchable when it comes to dividing by. It does not even have a sign in front of it. Well, it’s legal to have, but it’s meaningless.

And that’s not all. Besides not being allowed in a denominator, zeros are literally discriminated against beyond belief. How else could you characterize the fact that zeros are declared as pathological liars as their innocent value is equated to FALSE in logical expressions, while all other more privileged numbers represent TRUE, even the negative and irrational ones!

Extraordinary qualities of zeros

Despite their literal zero value, their informational value and qualities are not less than, and in many cases significantly surpass those of their siblings. In a sense, zero is a proverbial center of the universe, as all the other numbers dislocated around it as planets around the sun. It is not coincidental that zeros are denoted as circles, which makes them forerunners and likely ancestors of the glorified π.

Speaking of π, what is all the buzz around it? It’s irrational. It’s inferior to 0: it takes 2 π’s to just draw a single zero (remember O=2πR?). Besides, zeros are not just well rounded, they are perfectly rounded.

Privacy protection experts and GDPR enthusiasts love zeros. While other small numbers are required to be suppressed in published demographical reports, zeros may be shown prominently and proudly as they disclose no one’s personally identifiable information (PII).

No number rivals zero. Zeros are perfect numerators and equalizers. If you divide zero by any non-zero member of the digital community, the result will always be zero. Always, regardless of the status of that member. And yes, zeros are perfect common denominators, despite being prohibited from that role for centuries.

Zeros are the most digitally neutral and infinitely tolerant creatures. What other number has tolerated for so long such abuse and discrimination!

Enough is enough!

Dividing by zero opens new horizons

Can you imagine what new opportunities will arise if we break that centuries-old tradition and allow dividing by zero? What new horizons will open! What new breakthroughs and discoveries can be made!

With no more prejudice and prohibition of the division by zero, we can prove virtually anything we wish. For example, here is a short 5-step mathematical proof of “4=5”:

1)   4 – 4 = 10 – 10
2)   22 – 22 = 5·(2 – 2)
3)   (2 + 2)·(2 – 2) = 5·(2 – 2) /* here we divide both parts by (2 – 2), that is by 0 */
4)   (2 + 2) = 5
5)   4 = 5

Let’s make the next logical step. If dividing by zero can make any wish a reality, then producing a number of our choosing by dividing a given number by zero scientifically proves that division by zero is not only legitimate, but also feasible and practical.

As you will see below, division by zero is not that easy, but with the power of SAS, the power to know and the powers of curiosity, imagination and perseverance nothing is impossible.

Division by zero - SAS implementation

Consider the following use case. Say you think of a “secret” number, write it on a piece of paper and put in a “secret” box. Now, you take any number and divide it by zero. If the produced result – the quotient – is equal to your secret number, wouldn’t it effectively demonstrate the practicality and magic power of dividing by zero?

Here is how you can do it in SAS. A relatively “simple”, yet powerful SAS macro %DIV_BY_0 takes a single number as a numerator parameter, divides it by zero and returns the result equal to the one that is “hidden” in your “secret” box. It is the ultimate, pure artificial intelligence, beyond your wildest imagination.

All you need to do is to run this code:

 
data MY_SECRET_BOX;        /* you can use any dataset name here */
   MY_SECRET_NUMBER = 777; /* you can use any variable name here and assign any number to it */
run;
 
%macro DIV_BY_0(numerator);
 
   %if %sysevalf(&numerator=0) %then %do; %put 0:0=1; %return; %end;
   %else %let putn=&sysmacroname; 
   %let %sysfunc(putn(%substr(&putn,%length(&putn)),words.))=
   %sysevalf((&numerator/%sysfunc(constant(pi)))**&sysrc);  
   %let a=com; %let null=; %let nu11=%length(null); 
   %let com=*= This is going to be an awesome blast! ;
   %let %substr(&a,&zero,&zero)=*Close your eyes and open your mind, then;
   %let imagine = "large number like 71698486658278467069846772 Bytes divided by 0";
   %let O=%scan(%quote(&c),&zero+&nu11); 
   %let l=%scan(%quote(&c),&zero);
   %let _=%substr(%scan(&imagine,&zero+&nu11),&zero,&nu11);
   %let %substr(&a,&zero,&zero)%scan(&&&a,&nu11+&nu11-&zero)=%scan(&&&a,-&zero,!b)_;
   %do i=&zero %to %length(%scan(&imagine,&nu11)) %by &zero+&zero;
   %let null=&null%sysfunc(&_(%substr(%scan(&imagine,&nu11),&i,&zero+&zero))); %end;
   %if &zero %then %let _0=%scan(&null,&zero+&zero); %else;
   %if &nu11 %then %let _O=%scan(&null,&zero);
   %if %qsysfunc(&O(_&can)) %then %if %sysfunc(&_0(&zero)) %then %put; %else %put;
   %put &numerator:0=%sysfunc(&_O(&zero,&zero));
   %if %sysfunc(&l(&zero)) %then;
 
%mend DIV_BY_0;
 
%DIV_BY_0(55); /* parameter may be of any numeric value */

When you run this code, it will produce in the SAS LOG your secret number:

55:0=777

How is that possible without the magic of dividing by zero? Note that the %DIV_BY_0 macro has no knowledge of your dataset name, nor the variable name holding your secret number value to say nothing about your secret number itself.

That essentially proves that dividing by zero can practically solve any imaginary problem and make any wish or dream come true. Don’t you agree?

There is one limitation though. We had to make this sacrifice for the sake of numeric social justice. If you invoke the macro with the parameter of 0 value, it will return 0:0=1 – not your secret number - to make it consistent with the rest of non-zero numbers (no more exceptions!): “any number, except zero, divided by itself is 1”.

Challenge

Can you crack this code and explain how it does it? I encourage you to check it out and make sure it works as intended. Please share your thoughts and emotions in the Comments section below.

Disclosure

This SAS code contains no cookies, no artificial sweeteners, no saturated fats, no psychotropic drugs, no illicit substances or other ingredients detrimental to your health and integrity, and no political or religious statements. It does not collect, distribute or sell your personal data, in full compliance with FERPA, HIPPA, GDPR and other privacy laws and regulations. It is provided “as is” without warranty and is free to use on any legal SAS installation. The whole purpose of this blog post and the accompanied SAS programming implementation is to entertain you while highlighting the power of SAS and human intelligence, and to fool around in the spirit of the date of this publication.

Dividing by zero with SAS was published on SAS Users.

2月 222019
 

Have you heard? SAS recently announced a new practical programming credential, SAS® Certified Specialist: Base Programming Using SAS® 9.4. This new practical programming credential is different than our previous exams. Now it requires you to take a performance-based exam, in which you access a SAS environment and then write and execute SAS code. Practical, right? At the end, your answers are scored for correctness.

This new SAS Certified Specialist credential will run in parallel with the current SAS Certified Base Programmer credential until June 2019. So make sure to check out the complete exam content guide for a list of objectives that are tested on the exam!

A new certification guide


SAS is also releasing an accompanying certification guide to help you prepare for the new programming credential: SAS® Certified Specialist Prep Guide: Base Programming Using SAS® 9.4. This new certification guide has been streamlined to remove redundancy throughout the chapters and has been aligned with the courses, SAS® Programming 1: Essentials and SAS® Programming 2: Data Manipulation Techniques, along with the exam content guide.

The new certification guide also includes a workbook that provides programming scenarios to help you prepare for the performance-based portion of the exam. Make sure to also review the SAS® 9.4 Base Programming Exam Experience Tutorial to help you prepare!

Here are some of the changes made to the certification guide:

    • Removal of INFILE/INPUT statements. These statements are no longer taught in SAS® Programming 1 and SAS® Programming 2 courses or tested in the exam. Thus, these statements have been replaced with the SET statement and the IMPORT procedure.
    • Removal of arrays. Arrays are no longer taught in SAS® Programming 1 and SAS® Programming 2 courses or tested in the exam.
    • Addition of the TRANSPOSE and EXPORT procedures, as well as macro variables. These additions are tested and are taught in the SAS® Programming 1 and SAS® Programming 2 courses.
    • Updating of existing examples and the addition of more annotated examples that are easier to follow and have better quality graphics. These changes help to improve the readability of examples, and there is a closer relationship between the sample questions in the book and the questions that appear on the actual exam.
    • Installation of sample data, and the way you work with SAS libraries has been simplified.
    • A new companion piece, the Quick Syntax Reference Guide, is also available for download.

More opportunities to test your skills

For even more programming practice check out Ron Cody’s Learning SAS by Example. This is full of practical examples, and exercises with solutions which allow you to test your programming skills for the exam.

Not ready for the certification exam just yet? You can also browse our books for getting started with SAS. There you will find, of course, The Little SAS Book: A Primer 5th Edition and Exercises and Projects for The Little SAS® Book, Fifth Edition. Both are great guides for new users wanting to learn SAS and practice before getting to the New Base Programming Specialist Exam!

Thinking about getting SAS® certified? Check out the new SAS certification guide! was published on SAS Users.

2月 062019
 

Splitting external text data files into multiple files

Recently, I worked on a cybersecurity project that entailed processing a staggering number of raw text files about web traffic. Millions of rows had to be read and parsed to extract variable values.

The problem was complicated by the varying records composition. Each external raw file was a collection of records of different structures that required different parsing programming logic. Besides, those heterogeneous records could not possibly belong to the same rectangular data tables with fixed sets of columns.

Solving the problem

To solve the problem, I decided to employ a "divide and conquer" strategy: to split the external file into many files, each with a homogeneous structure, then parse them separately to create as many output SAS data sets.

My plan was to use a SAS DATA Step for looping through the rows (records) of the external file, read each row, identify its type, and based on that, write it to a corresponding output file.

Like how we would split a data set into many:

 
data CARS_ASIA CARS_EUROPE CARS_USA;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   output CARS_ASIA;
      when('Europe') output CARS_EUROPE;
      when('USA')    output CARS_USA;
   end;   
run;

But how do you switch between the output files? The idea came from SAS' Chris Hemedinger, who suggested using multiple FILE statements to redirect output to different external files.

Splitting an external raw file into many

As you know, one can use PUT statement in a SAS DATA Step to output a character string or a combination of character strings and variable values into an external file. That external file (a destination) is defined by a

 
filename inf  'c:\temp\input_file.txt';
filename out1 'c:\temp\traffic.txt';
filename out2 'c:\temp\system.txt';
filename out3 'c:\temp\threat.txt';
filename out4 'c:\temp\other.txt';
 
data _null_;
   infile inf;
   input REC_TYPE $10. @;
   input;
   select(REC_TYPE);
      when('TRAFFIC') file out1;
      when('SYSTEM')  file out2;
      when('THREAT')  file out3;
      otherwise       file out4;
   end;
   put _infile_;
run;

In this code, the first INPUT statement retrieves the value of REC_TYPE. The trailing @ line-hold specifier ensures that an input record is held for the execution of the next INPUT statement within the same iteration of the DATA Step. It may not be used exactly as written, but the point is you need to capture the filed(s) of interest and stay on the same row.

The second INPUT statement reads the whole raw file record into the _infile_ DATA Step automatic variable.

Depending on the value of the REC_TYPE variable assigned in the first INPUT statement, SELECT block toggles the FILE definition between one of the four filerefs, out1, out2, out3, or out4.

Then the PUT statement outputs the _infile_ automatic variable value to the output file defined in the SELECT block.

Splitting a data set into several external files

Similar technique can be used to split a data table into several external raw files. Let’s combine the above two code samples to demonstrate how you can split a data set into several external raw files:

 
filename outasi 'c:\temp\cars_asia.txt';
filename outeur 'c:\temp\cars_europe.txt';
filename outusa 'c:\temp\cars_usa.txt';
 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi;
      when('Europe') file outeur;
      when('USA')    file outusa;
   end;
   put _all_; 
run;

This code will read observations of the SASHELP.CARS data table, and depending on the value of ORIGIN variable, put _all_ will output all the variables (including automatic variables _ERROR_ and _N_) as named values (VARIABLE_NAME=VARIABLE_VALUE pairs) to one of the three external raw files specified by their respective file references (outasi, outeur, or outusa.)

You can modify this code to produce delimited files with full control over which variables and in what order to output. For example, the following code sample produces 3 files with comma-separated values:

 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi dlm=',';
      when('Europe') file outeur dlm=',';
      when('USA')    file outusa dlm=',';
   end;
   put make model type origin msrp invoice; 
run;

You may use different delimiters for your output files. In addition, rather than using mutually exclusive SELECT, you may use different logic for re-directing your output to different external files.

Bonus: How to zip your output files as you create them

For those readers who are patient enough to read to this point, here is another tip. As described in this series of blog posts by Chris Hemedinger, in SAS you can read your external raw files directly from zipped files without unzipping them first, as well as write your output raw files directly into zipped files. You just need to specify that in your filename statement. For example:

UNIX/Linux

 
filename outusa ZIP '/sas/data/temp/cars_usa.txt.gz' GZIP;

Windows

 
filename outusa ZIP 'c:\temp\cars.zip' member='cars_usa.txt';

Your turn

What is your experience with creating multiple external raw files? Could you please share with the rest of us?

How to split a raw file or a data set into many external raw files was published on SAS Users.