Tech Talk

4月 092017
 

Thanks to a new open source project from SAS, Python coders can now bring the power of SAS into their Python scripts. The project is SASPy, and it's available on the SAS Software GitHub. It works with SAS 9.4 and higher, and requires Python 3.x.

I spoke with Jared Dean about the SASPy project. Jared is a Principal Data Scientist at SAS and one of the lead developers on SASPy and a related project called Pipefitter. Here's a video of our conversation, which includes an interactive demo. Jared is obviously pretty excited about the whole thing.

Use SAS like a Python coder

SASPy brings a "Python-ic" sensibility to this approach for using SAS. That means that all of your access to SAS data and methods are surfaced using objects and syntax that are familiar to Python users. This includes the ability to exchange data via pandas, the ubiquitous Python data analysis framework. And even the native SAS objects are accessed in a very "pandas-like" way.

import saspy
import pandas as pd
sas = saspy.SASsession(cfgname='winlocal')
cars = sas.sasdata("CARS","SASHELP")
cars.describe()

The output is what you expect from pandas...but with statistics that SAS users are accustomed to. PROC MEANS anyone?

In[3]: cars.describe()
Out[3]: 
       Variable Label    N  NMiss   Median          Mean        StdDev  
0         MSRP     .   428      0  27635.0  32774.855140  19431.716674   
1      Invoice     .   428      0  25294.5  30014.700935  17642.117750   
2   EngineSize     .   428      0      3.0      3.196729      1.108595   
3    Cylinders     .   426      2      6.0      5.807512      1.558443   
4   Horsepower     .   428      0    210.0    215.885514     71.836032   
5     MPG_City     .   428      0     19.0     20.060748      5.238218   
6  MPG_Highway     .   428      0     26.0     26.843458      5.741201   
7       Weight     .   428      0   3474.5   3577.953271    758.983215   
8    Wheelbase     .   428      0    107.0    108.154206      8.311813   
9       Length     .   428      0    187.0    186.362150     14.357991   

       Min       P25      P50      P75       Max  
0  10280.0  20329.50  27635.0  39215.0  192465.0  
1   9875.0  18851.00  25294.5  35732.5  173560.0  
2      1.3      2.35      3.0      3.9       8.3  
3      3.0      4.00      6.0      6.0      12.0  
4     73.0    165.00    210.0    255.0     500.0  
5     10.0     17.00     19.0     21.5      60.0  
6     12.0     24.00     26.0     29.0      66.0  
7   1850.0   3103.00   3474.5   3978.5    7190.0  
8     89.0    103.00    107.0    112.0     144.0  
9    143.0    178.00    187.0    194.0     238.0  

SASPy also provides high-level Python objects for the most popular and powerful SAS procedures. These are organized by SAS product, such as SAS/STAT, SAS/ETS and so on. To explore, issue a dir() command on your SAS session object. In this example, I've created a sasstat object and I used dot<TAB> to list the available SAS analyses:

SAS/STAT object in SASPy

The SAS Pipefitter project extends the SASPy project by providing access to advanced analytics and machine learning algorithms. In our video interview, Jared presents a cool example of a decision tree applied to the passenger survival factors on the Titanic. It's powered by PROC HPSPLIT behind the scenes, but Python users don't need to know all of that "inside baseball."

Installing SASPy and getting started

Like most things Python, installing the SASPy package is simple. You can use the pip installation manager to fetch the latest version:

pip install saspy

However, since you need to connect to a SAS session to get to the SAS goodness, you will need some additional files to broker that connection. Most notably, you need a few Java jar files that SAS provides. You can find these in the SAS Deployment Manager folder for your SAS installation:
../deploywiz/sas.svc.connection.jar
..deploywiz/log4j.jar
../deploywiz/sas.security.sspi.jar
../deploywiz/sas.core.jar

The jar files are compatible between Windows and Unix, so if you find them in a Unix SAS install you can still copy them to your Python Windows client. You'll need to modify the sascgf.py file (installed with the SASPy package) to point to where you've stashed these. If using local SAS on Windows, you also need to make sure that the sspiauth.dll is in your Windows system PATH. The easiest method to add SASHOMESASFoundation9.4coresasexe to your system PATH variable.

All of this is documented in the "Installation and Configuration" section of the project documentation. The connectivity options support an impressively diverse set of SAS configs: Windows, Unix, SAS Grid Computing, and even SAS on the mainframe!

Download, comment, contribute

SASPy is an open source project, and all of the Python code is available for your inspection and improvement. The developers at SAS welcome you to give it a try and enter issues when you see something that needs to be improved. And if you're a hotshot Python coder, feel free to fork the project and issue a pull request with your suggested changes!

The post Introducing SASPy: Use Python code to access SAS appeared first on The SAS Dummy.

5月 302014
 

I wish I had a nickle for every time I heard this question at SAS Global Forum:

"So, does this SAS Studio thing replace SAS Enterprise Guide?"

SAS Studio is a pretty big deal. It's groundbreaking in several ways:

  • It's a web-based programming interface to SAS. It runs in your browser, which means that end users don't have to install anything (when connecting to a remote SAS session).
  • It's an HTML5-based application, so there are no browser plugins needed. It runs on Windows, Macs, and even the iPad.
  • It's the basis for new offerings from SAS, most notably the SAS University Edition. This offering is free to just about any learner for non-commercial use. The SAS University Edition includes SAS, running in a virtual machine, packaged with SAS Studio as the user interface. Since its launch earlier this week, people have been downloading it like crazy.

You're going to be hearing a lot about SAS Studio. It was even the theme for this month's SAS Tech Report.
If you haven't seen SAS Studio, take a few minutes and watch my SAS Tech Talk interview with Shannon Smith, the SAS R&D testing manager for the product:

 

So what does this mean for those of us who have invested our skills and processes in SAS Enterprise Guide? If you read this blog regularly, you know that includes me! Does this "new app on the block" replace our beloved SAS Enterprise Guide? The answer is No -- and Yes.

No, SAS Studio isn't a direct SAS Enterprise Guide replacement. SAS Enterprise Guide continues to get new features, mostly targeting productivity enhancements and integration with other SAS offerings, such as SAS Visual Analytics. Many thousands of users around the world use SAS Enterprise Guide to manage process flows, reporting and analytics, database access, and custom processes. SAS Studio doesn't have all of that infrastructure (at least, not yet), and cannot step in to replace all of that.

But also, Yes: SAS Studio can replace some uses of SAS Enterprise Guide. If you use SAS Enterprise Guide simply as way to manage SAS programs in your SAS environment, then you can certainly use SAS Studio instead (or as well) to develop and maintain those programs. SAS Studio also includes some tasks for non-programmers, similar to those found in SAS Enterprise Guide -- but for now the library isn't as rich as what you'll find in SAS Enterprise Guide. And with the SAS University Edition, SAS Studio will represent the first SAS experience for the next generation of SAS programmers.

Sometimes SAS users ask me (usually in a hushed tone): Why does SAS create these different applications that seem to compete with each other? Is there some sort of contest in SAS R&D to see which teams can outdo the others? My answer: while these apps might have a certain amount of overlap, they really do serve different purposes and different audiences. Our goal is to enable SAS users -- regardless of discipline, industry, or expertise -- with the tools that are most fit for their particular purpose. One size does not fit all (though some diehard PC SAS fans might disagree with me).

Plus, here's another secret: the same developers have built all of these applications. The SAS Studio development team includes people who worked on SAS Display Manager (you know, "PC SAS") and SAS Enterprise Guide. This is a direct benefit of SAS being such a great workplace: nobody leaves. That means that the lessons learned from customers and developers are carried over and applied in each successive "app generation". If developers are competing, then they are mostly competing with the proven work they've done in the past. But since the teams always have new technology and techniques at their disposal, it's the end users who win.

tags: SAS Enterprise Guide, SAS Studio, SASAnalyticsU, Tech Talk
5月 122014
 

It was just a couple of years ago that folks were skeptical about the term "data scientist". It seemed like a simple re-branding of an established job role that carried titles such as "business analyst", "data manager", or "reporting specialist".

But today, it seems that the definition of the "Data Scientist" job role has gelled into something new. At SAS Global Forum 2014, I heard multiple experts describe data science qualifications in a similar way, including these main skills:

  • Ability to manage data. Know how to access it, whether it's in Excel, relational databases, or Hadoop -- or on the Web. Data acquisition and preparation still form the critical foundation for any data analysis.
  • Knowledge of applied statistics. Perhaps not PhD-level stuff, but more than the basics of counts, sums, and averages. You need to know something about predictive analytics, forecasting, and the process of building and maintaining analytical models.
  • Computer science, or at least some programming skills. Point-and-click tools can help keep you productive, but it's often necessary to drop into code to achieve the flexibility you need to acquire some data or apply an analysis that's not provided "out of the box".
  • And finally -- and this makes a Data Scientist the most relevant -- the ability to understand and communicate the needs of the business. You might be a data wiz and have metrics out the wazoo, but an effective data scientist must know which fields and metrics matter most to the organization he or she serves. And you must be able to ask the right questions of the stakeholders, and then communicate results that will lead to informed action.

I don't claim to be a data scientist -- I'm not strong enough in the statistical pillar -- but I do have my moments. For example, I consider my recent analysis of blog spam to be data-science-like. Even so, I'm not brave enough to change my business cards just yet.

At SAS Global Forum I talked to Wayne Thompson, Chief Data Scientist at SAS. (Yes, even SAS is capitalizing on the buzz by having a data science technologies team.) Here he is introducing SAS In-Memory Statistics for Hadoop, a programming interface that's meant to empower data scientists:


 
Wayne and I also talked a couple of other times: once about SAS Visual Statistics ("it's the shizzle", says the bald white guy -- not me), and once about data science in general.

Data science isn't all just "Wayne's world" -- there were plenty of other data science practitioners at the conference. For example, check out Lisa Arney's interview with Chuck Kincaid of Experis, talking about how to be a data scientist using SAS. (See his full paper here.) And SAS' Mary Osborne, who presented on Star Wars and the Art of Data Science. (Her paper reveals the unspoken fifth pillar of a data scientist: it's good to be part nerd.)

What do you think about the "new" field of data science? Have you changed your business card to include the "data scientist" title?

tags: data scientist, Hadoop, SAS global forum, SAS In-Memory Statistics for Hadoop, sasgf14, Tech Talk
3月 182014
 

Talking tech with Nancy R. at SASGF13

For the third year now, I'll be hosting the SAS Tech Talk shows at SAS Global Forum. (Since I've been invited back I can assume that I'm more of an Ellen DeGeneres than a Seth MacFarlane.)

These shows feature SAS technical experts (mostly from SAS R&D) who are prepared to discuss and demonstrate the technologies that they work on. These folks have job titles such as "Chief Data Scientist", "Principal Research Statistician", and "Senior Analytical Consultant". They work in areas such as "SAS Output Delivery and Reporting", "BI Data Visualization", and "Enterprise Management Integration".

These guests represent the SAS R&D talent that bring the SAS products to life. Some will be names that you know from books and blogs, such as Rick Wicklin. Others are return guests to SAS Tech Talk, such as Nascif and Himesh. And still others might be faces that are new to you, but they are the movers and shakers when it comes to SAS technology.

This year we have three SAS Tech Talk shows:

  • Monday, 24March at 2pm EDT
  • Tuesday, 25March at 2pm EDT
  • Wednesday, 26March at 9am EDT

Each show is different, and we're covering 12 great topics in all. I'm very excited about this year's lineup! I'd list out the topics right here, but some are the subject of exciting product announcements and I don't want to steal anyone's thunder!

There is no doubt that you get the best conference experience when you can attend in person. But even if you cannot attend this year, you don't have to miss out. Much of the content -- the opening sessions, SAS Tech Talk, and many breakout sessions -- will be streamed on live video channels, all hosted at www.sasglobalforum.com. Any content that is streamed will also be archived so that you can watch it later (can you say "movie night"?).

See also
Previous SAS Tech Talk shows on YouTube

tags: sasgf14, Tech Talk
12月 192013
 

If there's anyone who represents the global nature of SAS software, it's Falko Schulz. He's a German who lives in Brisbane, Australia while he works for SAS R&D based in Cary, NC.

Falko works on the team that produces SAS Visual Analytics, specifically the "explorer" portion of the tool. He brings a ton of SAS knowledge and real-world experience to his role. In fact, you can see Falko demonstrate his SAS-coding chops in his recent posts on SAS blogs:

While he's creative and brilliant with all-things-SAS, I find Falko to be humble and personable. That's why I was excited to host him on a SAS Tech Talk session at SAS Global Forum last year. During our interview, Falko discusses how he manages working with a team while located half-a-world away. He also shows off several of the new features in SAS Visual Analytics Explorer, including some of the new visualizations and the built-in forecasting methods.

Watch the video to learn more about SAS Visual Analytics Explorer and the people who build it.

tags: SAS Visual Analytics, sasgf13, Tech Talk
12月 192013
 

If there's anyone who represents the global nature of SAS software, it's Falko Schulz. He's a German who lives in Brisbane, Australia while he works for SAS R&D based in Cary, NC.

Falko works on the team that produces SAS Visual Analytics, specifically the "explorer" portion of the tool. He brings a ton of SAS knowledge and real-world experience to his role. In fact, you can see Falko demonstrate his SAS-coding chops in his recent posts on SAS blogs:

While he's creative and brilliant with all-things-SAS, I find Falko to be humble and personable. That's why I was excited to host him on a SAS Tech Talk session at SAS Global Forum last year. During our interview, Falko discusses how he manages working with a team while located half-a-world away. He also shows off several of the new features in SAS Visual Analytics Explorer, including some of the new visualizations and the built-in forecasting methods.

Watch the video to learn more about SAS Visual Analytics Explorer and the people who build it.

tags: SAS Visual Analytics, sasgf13, Tech Talk
9月 122013
 

SAS Data Management is a popular topic here on the SAS interwebs. You can find all types of information ranging from thought leadership to white papers to product details.

At SAS Global Forum I sat down with Nancy Rausch, one of the principal R&D managers behind the SAS Data Management suite of products. During our interview, Nancy talked about SAS support for data management at all levels, from the old-school ETL jobs that run on mainframes, to the latest products and techniques that support data management as a full discipline.

During this video, you'll see Nancy demonstrate the data management process using a new set of tools that allow data managers to focus on the "what" (what data, what terms, what events) and less on the tedious "how" (such as writing JCL programs and checking job status in the middle of the night).

For more information, be sure to visit SAS Data Management on www.sas.com.

tags: data management, sasgf13, Tech Talk
9月 122013
 

SAS Data Management is a popular topic here on the SAS interwebs. You can find all types of information ranging from thought leadership to white papers to product details.

At SAS Global Forum I sat down with Nancy Rausch, one of the principal R&D managers behind the SAS Data Management suite of products. During our interview, Nancy talked about SAS support for data management at all levels, from the old-school ETL jobs that run on mainframes, to the latest products and techniques that support data management as a full discipline.

During this video, you'll see Nancy demonstrate the data management process using a new set of tools that allow data managers to focus on the "what" (what data, what terms, what events) and less on the tedious "how" (such as writing JCL programs and checking job status in the middle of the night).


For more information, be sure to visit SAS Data Management on www.sas.com.

tags: data management, sasgf13, Tech Talk
8月 152013
 

In SAS 9.4, the SAS programming language continues add new features by the truckload. I've already discussed PROC DELETE (which is actually an old feature, but like an 80s hit song it's now back with a better version).

In this SAS Tech Talk video from SAS Global Forum 2013, I talked with Rick Langston about the advancements in the SAS programming language. Rick has been with SAS for...well, a long time. He's considered to be the steward of the SAS programming language. In this session, Rick discusses the process that we use to add new syntax to the language and to ensure its integrity.

 
Rick also talks about three specific new features in 9.4, all of which were added because customers asked for them. (It's difficult to read the Rick's syntax examples in the video, so I've included reference links below so that you can learn more.)

FILENAME ZIP access method

This brings the ability to read and write compressed ZIP files directly into the SAS language. For more information, see the FILENAME ZIP documentation. If you don't have SAS 9.4, you can still create ZIP files using ODS PACKAGE.

DOSUBL function

Rick calls this "submitting SAS code on the side", as it allows you to run a SAS step or statement from "inside" a currently running step. You can learn more from the DOSUBL function reference, or from this SAS Global Forum paper. I've also written a post with a specific example in SAS Enterprise Guide.

LOCKDOWN system option and statement

This one will excite SAS administrators. You can set the LOCKDOWN system option in a batch SAS session or SAS Workspace server to limit some of the "dangerous" functions of SAS and, more importantly, limit the file areas in which the SAS session will operate. We don't currently have a documentation link for this, so I'll dive in a bit further in a future blog post.

That's just a small taste of what's new. Be sure to check out the complete What's New in SAS 9.4 document for even more goodies.

tags: SAS 9.4, SAS programming, sasgf13, Tech Talk
4月 232013
 

Even if you cannot attend SAS Global Forum next week, you can experience it virtually via the Livestream sessions.

This year I will reprise my role as host of SAS Tech Talks, a pair of live webcasts that feature SAS R&D professionals and their latest technological wares. Click "play" on the viewer below to see what's happening right now. Below this widget, you can see the scheduled SAS Tech Talks lineup. (Note that scheduled times are in the Pacific USA timezone -- coming to you from San Francisco!)

Watch live streaming video from sasglobalforum at livestream.com

 

Monday, April 29 at 12:30 p.m. PT

SAS Web Editor
Mike Monaco, Director, Web SAS Technologies R&D

SAS Visual Analytics Explorer
Falko Schulz, Principal Software Developer, SAS BI Visualization R&D

SAS Mobile BI
Himesh Patel, Senior Director, SAS Data Visualization R&D

SAS program language advancements
Rick Langston, Senior Manager, SAS Platform R&D

DataFlux and Data Integration
Nancy Rausch, Senior Manager, SAS Data Management R&D

Tuesday, April 30 at 10:30 a.m. PT

SAS Marketing Automation
Brian Chick, Senior Manager, Customer Intelligence R&D

SAS App Works
Marty Tomasi, Director, SAS Middle Tier Platform R&D

ODS Graphics
Sanjay Matange, Director, SAS Scientific Visualization R&D

SAS Visual Statistics
Tonya Balan, Director, SAS Analytics Product Management

I've known most of these folks for a long time, and according to my quick math they possess well over 150 years of SAS experience (cumulatively, that is -- not each). And they all still work on cutting-edge projects that you'll want to learn more about!

tags: sasgf13, Tech Talk