Tech

4月 172017
 

In this post, I will explain how LASR Servers (both Distributed and Non-distributed) are started from the SAS Visual Analytics Administrator application. We will also look at one particular issue and I will provide you with more details to understand the situation and two strategies to address this issue.

When you start a LASR Server from the Visual Analytics Administrator application, the system processes your request in slightly different ways depending on whether you are starting a distributed or non-distributed LASR Server.

What happens in the background for a distributed LASR Server?

1.     The administrator starts a LASR Server from the Visual Analytics Administrator application.

2.     The Visual Analytics Administrator application generates the required code and sends it to the JES to be executed on the SAS Visual Analytics Root Node.

3.     The JES invokes the SAS Object Spawner on the Visual Analytics Root Node.

4.     The SAS Object Spawner initializes a SAS Workspace Server session.

5.     A client connection is established between the JES and the SAS Workspace Server session.
The JES sends the distributed LASR Server startup code generated in step 2, and the SAS Workspace Server session executes it.

6.     As soon as the distributed LASR Server startup code is executed, a distributed LASR Server instance is created. This instance is independent from the SAS Workspace Server session.
After the distributed LASR Server startup code executes, the SAS Workspace Server session, and the client connection drop down. But the LASR Server instance is up and persistent until the administrator stops it.

What happens for a non-distributed LASR Server?

1.     The administrator starts a LASR Server from the Visual Analytics Administrator application.

2.     The Visual Analytics Administrator application generates the required code and sends it to the JES to be executed on the SAS Visual Analytics Root Node.

3.     The JES invokes the SAS Object Spawner on the Visual Analytics Root Node.

4.     The SAS Object Spawner initializes a SAS Workspace Server session.

5.     A client connection is established between the JES and the SAS Workspace Server session.
The JES sends the non-distributed LASR Server startup code generated on step 2, and the SAS Workspace Server session executes it.

6.     As soon as the non-distributed LASR Server startup code is executed, a non-distributed LASR Server instance is created. This instance is dependent on the SAS Workspace Server session.
After the distributed LASR Server startup code executes, the SAS Workspace Server session, and the client connection become persistent until the administrator stops the non-distributed LASR Server.
The SAS Workspace Server session and the client connection are required for the non-distributed LASR Server to be up and persistent.

Potential non-distributed LASR Server issue

The client connection between the JES and the SAS Workspace Server session in non-distributed deployment is crucial. This connection has to be persistent during the non-distributed LASR Server lifetime. As soon as the JES or SAS Workspace Server session goes down, the client connection is broken and the non-distributed LASR Server will go down as well. And, its data become unavailable.

If administrators manage their SAS non-distributed LASR Servers from the SAS Visual Analytics Administrator web application, this will create a dependency between the non-distributed LASR Servers and the middle tier - more specifically with the SASServer1_1 instance (the sas.wip.soapservices application). This dependency does not exist for distributed LASR Servers.

When the dependency exists, each time the middle tier or the SASServer1_1 instance is down, all non-distributed LASR Servers that were managed using the SAS Visual Analytics Administrator will go down automatically because the client connection to the non-distributed LASR Server is broken and its associated SAS Workspace Server session will go down.

This issue will also appear if the network connection between the Middle Tier and the SAS Visual Analytics Root Node is broken. This will break the client connection and terminate the SAS Workspace Server session(s) and their associated non-distributed LASR Server(s).

How can we prevent the issue?

I can think of at least two techniques you can use to prevent the dependency of your non-distributed LASR Servers upon your middle-tier:

  • Start your non-distributed LASR Servers from a SAS program.
  • Configure your LASR Server to start using the AutoLoad facility.

Start your non-distributed LASR Server from a SAS program

Here is a sample program that you might use to start a non-distributed LASR Server:

/* Start the LASR Analytic Server by defining the LASR library*/
libname LASRLIB SASIOLA 
        startserver  host="sasserver01.race.sas.com" port=10013
        signer="sasserver05.race.sas.com:7980/SASLASRAuthorization";
 
/* Keep the SAS session up until SERVERTERM received */
proc vasmp;
   serverwait  port=10013;
quit;

 

This program could be scheduled or set as a service under UNIX or Windows to be executed as soon as SAS Visual Analytics is started/restarted. Whichever strategy you use, you will want to have the code run after these services are up and available:

The SAS Metadata Server.
The Object Spawner on the SAS Visual Analytics Root Node.
The SASLASRAuthaurization service on the Middle Tier:

Start the non-distributed LASR Server Using the AutoLoad capability

AutoLoad runs as a periodic scheduled task. In the standard configuration, a new run of AutoLoad is started every 15 minutes.
AutoLoad periodically scans the contents of a designated host directory, which is referred to as the AutoLoad data directory or drop zone.
AutoLoad will automatically start the associated SAS LASR Server, if it is not yet started.

If you do not already have a LASR library configured with AutoLoad, you will need to use the SAS Deployment Manager application to configure an AutoLoad directory.

In Administration Tasks, choose Configure an AutoLoad Directory for SAS Visual Analytics. And follow the instructions.

Once you have AutoLoad setup, you must make sure certain LASR Library Extended Attributes are set correctly to ensure that the AutoLoad process will automatically start your LASR Server. You will want to set/verify the following attributes for the LASR library:

  • To enable the LASR Server AutoStart only:

  • VA.AutoLoad.AutoStart: YES to indicate that you want the associated LASR Server to be started if it is not already running.
  • VA.AutoLoad.Enabled: YES to enable the AutoLoad process for the Library.
  • To enable the LASR table(s) AutoLoad (optional):

  • VA.Default.MetadataFolder: to relocate the table(s) metadata registration against a specific metadata folder.
  • VA.Default.Location: to specify the location of the table directory at OS level.
  • VA.Default.Sync.*: to manage the table(s) from the VA.Default.Location.

On your SAS Visual Analytics machine, you have to:

  • Set the appropriate security against the [SAS-Configuration-Directory]/Levn/Applications/AutoLoad/[Your-LASR-Server-Library-LibRef] to be sure that the appropriate LASR Server administrator will be allowed to manage it.
  • Set the AutoLoad process frequency (default is 15 minutes).
    “TIME_INTERVAL_MINUTES=[Your-Value]” in the [SAS-Configuration-Directory]/Levn/Applications/AutoLoad/[Your-LASR-Server-Library-LibRef]/schedule.[sh|bat] file.
    Note: Reference the SAS Visual Analytics Administration Guide for more about  the timing of AutoLoad.
  • Schedule the AutoLoad process using the LASR Server administrator account responsible to manage this particular LASR Server by executing the [SAS-Configuration-Directory]/Levn/Applications/AutoLoad/[Your-LASR-Server-Library-LibRef]/schedule.[sh|bat] script.

Validate the AutoLoad setup by stopping the SAS LASR Server from SAS Visual Analytics Administrator application, or using SAS code. Wait for the “TIME_INTERVAL_MINUTES=[Your-Value]“, and look at the LASR Server status in SAS Visual Analytics Administrator application, and the AutoLoad logs in [SAS-Configuration-Directory]/Levn/Applications/AutoLoad/[Your-LASR-Server-Library-LibRef]/Logs (one log file each time the AutoLoad process is executed).

Whichever technique is used, the non-distributed LASR server will no longer be dependent on the middle-tier.

I hope this article has been helpful to you.

How LASR servers are started from SAS Visual Analytics Administrator was published on SAS Users.

4月 132017
 
The Penitent Magdalene

Titian (Tiziano Vecellio) (Italian, about 1487 - 1576) The Penitent Magdalene, 1555 - 1565, Oil on canvas 108.3 × 94.3 cm (42 5/8 × 37 1/8 in.) The J. Paul Getty Museum, Los Angeles; Digital image courtesy of the Getty's Open Content Program.

Even if you are a traditional SAS programmer and have nothing to do with cybersecurity, you still probably have to deal with this issue in your day-to-day work.

The world has changed, and what you do as a SAS programmer is not just between you and your computer anymore. However, I have found that many of us are still careless, negligent or reckless enough to be dangerous.

Would you scotch-tape your house key to the front door next to the lock and go on vacation? Does anybody do that? Still, some of us have no problem explicitly embedding passwords in our code.

That single deadly sin, the thing that SAS programmers (or any programmers) must not do under any circumstances, is placing unmasked passwords into their code. I must confess, I too have sinned, but I have seen the light and hope you will too.

Password usage examples

Even if SAS syntax calls for a password, never type it or paste it into your SAS programs. Ever.

If you connect to a database using a SAS/ACCESS LIBNAME statement, your libname statement might look like:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password=mypwd1;

If you specify the LIBNAME statement options for the metadata engine to connect to the metadata server, it may look like:

libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw=abcdefg;

If you use LIBNAME statement options for the metadata engine to connect to a database, it may look like:

libname oralib meta library=oralib dbuser=orauser dbpassword=orapw;

In all of the above examples, some password is “required” to be embedded in the SAS code. But it does not mean you should put it there “as is,” unmasked. SAS program code is usually saved as a text file, which is stored on your laptop or somewhere on a server. Anyone who can get access to such a file would immediately get access to those passwords, which are key to accessing databases that might contain sensitive information. It is your obligation and duty to protect this sensitive data.

Hiding passwords in a macro variable or a macro?

I’ve seen some shrewd SAS programmers who do not openly put passwords in their SAS code. Instead of placing the passwords directly where the SAS syntax calls for, they assign them to a macro variable in AUTOEXEC.SAS, an external SAS macro file, a compiled macro or some other SAS program file included in their code, and then use a macro reference, e.g.

/* in autoexec.sas or external macro */
%let opw = mypwd1;
 
/* or */
 
%macro opw;
	mypwd1
%mend opw;
/* in SAS program */
libname mydblib oracle user=myusr1 password=&opw
        path=airdb_remote schema=hrdept;
 
/* or */
 
libname mydblib oracle user=myusr1 password=%opw
        path=airdb_remote schema=hrdept;

Clever! But it’s no more secure than leaving your house key under the door mat. In fact it is much less secure. One who wants to look up your password does not even need to look under the door mat, oh, I mean look at your program file where the password is assigned to a macro variable or a macro. For a macro variable, a simple %put &opw; statement will print the password’s actual value in the SAS log. In case of a macro, one can use %put %opw; with the same result.

In other words, hiding the passwords do not actually protect them.

What to do

What should you do instead of openly placing or concealing those required passwords into your SAS code? The answer is short: encrypt passwords.

SAS software provides a powerful procedure for encrypting passwords that renders them practically unusable outside of the SAS system.

This is PROC PWENCODE, and it is very easy to use. In its simplest form, in order to encrypt (encode) your password abc123 you would need to submit just the following two lines of code:

proc pwencode in="abc123";
run;

The encrypted password is printed in the SAS log:

1 proc pwencode in=XXXXXXXX;
2 run;

{SAS002}3CD4EA1E5C9B75D91A73A37F

Now, you can use this password {SAS002}3CD4EA1E5C9B75D91A73A37F in your SAS programs. The SAS System will seamlessly take care of decrypting the password during compilation.

The above code examples can be re-written as follows:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password="{SAS002}9746E819255A1D2F154A26B7";
 
libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw="{SAS002}9FFC53315A1596D92F13B4CA";
 
libname oralib meta library=oralib dbuser=orauser
dbpassword="{SAS002}9FFC53315A1596D92F13B4CA";

Encryption methods

The {SAS002} prefix indicates encoding method. This SAS-proprietary encryption method which uses 32-bit key encryption is the default, so you don’t have to specify it in the PROC PWENCODE.

There are other, stronger encryption methods supported in SAS/SECURE:

{SAS003} – uses a 256-bit key plus 16-bit salt to encode passwords,

{SAS004} – uses a 256-bit key plus 64-bit salt to encode passwords.

If you want to encode your password with one of these stronger encryption methods you must specify it in PROC PWENCODE:

proc pwencode in="abc123" method=SAS003;
run;

SAS Log: {SAS003}50374C8380F6CDB3C91281FF2EF57DED10E6

proc pwencode in="abc123" method=SAS004;
run;

SAS Log: {SAS004}1630D14353923B5940F3B0C91F83430E27DA19164FC003A1

Beyond encryption

There are other methods of obscuring passwords to protect access to sensitive information that are available in the SAS Business Intelligence Platform. These are the AUTHDOMAIN= SAS/ACCESS option supported in LIBNAME statements, as well as PROC SQL CONNECT statements, SAS Token Authentication, and Integrated Windows Authentication. For more details, see Five strategies to eliminate passwords from your SAS programs.

Conclusion

Never place unencrypted password into your SAS program. Encrypt it!

Place this sticker in front of you until you memorize it by heart.

PROC PWENCODE sticker

 

One deadly sin SAS programmers should stop committing was published on SAS Users.

4月 112017
 

You may be wondering if you need something special to gain access to the Schedule Chart object. Don’t worry, you don’t, you just need to unhide this visualization if it isn’t already. You can do this from the Objects’ drop down menu. There are several other objects available to you if you’d like to show those as well. Simply check the ones you want to include in the list and then click ok.

The VA 7.3 Schedule Chart is similar to the traditional Gantt chart in that it serves to illustrate the start and finish duration of a category data item. You must provide a category data item and two date or datetime data items representing the start and end dates. You can also add a group by category, lattice by columns and/or rows.   Here is a simple example that visualizes my local school district’s 2016-2017 Traditional School Calendar.

Schedule charts can be used to visualize a variety of data such as:

  • Calendar Events
  • Project Tracking
  • Campaign/Promotional Runs
  • Floor Service Coverage

Essentially, any category which can be associated with a start and end date can use this visualization.

Here are some examples of using the Schedule Chart to look at project tracking data. Our team uses a similar visualization; however, I have modified real names and gave it a Star Wars theme for a bit of fun.

Example 1

In this example, the Project name is assigned as the main category. Here are a few takeaways:

  • The schedule chart gives a great bird’s eye view of a lot of data. This particular data has over 350 projects spanning a team of 19 individual members.
  • The schedule chart automatically includes the least and greatest date value. You can override the X Axis in the Properties tab by assigning a fixed minimum and maximum.

In this next screenshot, I have selected the manager Obi-Wan Kenobi to filter the Schedule Chart. Therefore, by adding section filters to this report, you can see how spotting coverage of Projects and Project Types are easy. And, with some Mock Combat planned later in the year, the Jedis might want to up their training.

Example 2

This example uses the same Schedule Chart role assignments as before, but different section prompt filters. Here, this report shows how an individual team member can use the Schedule Chart to visualize several things:

  • A list of his/her assigned projects.
  • The planned duration of each project.
  • How the projects are spread throughout the year.

If you chose to look at a particular Project Type, then this visualization would help list the Project names and how they span across the year.

Example 3

The next example moves away from the traditional use of putting the Project as the main category data item and instead places the Team Lead on the Y Axis. This now allows us to see how busy each Team member is and with which Project Type. By the way, did you notice the neat way the Team names are sorted? I used a custom sort!

Here are just a few more things you can do to enhance this visualization: you can adjust the transparency so you can see overlapping projects and you can easily add reference lines to the X Axis. In this example, I’ve added the reference lines for Q1 through Q4. I’ve also selected a manager from the report section filters and added an additional data item for the Label.

Example 4

In this example, I wanted to demonstrate the use of the Lattice rows and I also applied a Display Rule for the Project Type. This is a good way if you want to view overlapping information and the transparency property isn’t distinct enough.

 

Schedule Chart Limitations

If you want a bar representation on the Schedule Chart to appear then the data must have a start and end date for every row of data. If either is missing, then the category name will appear on the Y Axis but no bar will be displayed on the visualization.   Also, you cannot create an interaction from or to the Schedule Chart object. That means you cannot create a filter or brush with another object in the report area. The Schedule Chart will be filtered by either report or section prompts.   I hope you can include the Schedule Chart into your reports, it is one of my favorite visuals.

SAS Visual Analytics Designer 7.3 Schedule Chart was published on SAS Users.

4月 102017
 

You can expand on the functionality of SAS Visual Data Builder in SAS Visual Analytics by editing the query code, adding code for pre- and post-processing, or even writing your own query.   You can process single tables or join multiple tables, writing the output to a LASR  library, a SAS library, or a DBMS library.  But you can also easily schedule your queries, right from the Visual Data Builder interface.

Here’s how.

When a query is open in the workspace of Visual Data Builder, you can schedule the query from the application by clicking the Schedule (clock) icon.

The scheduling server used is determined by the SAS Visual Data Builder Scheduling preferences setting, shown below.

By default, the Visual Analytics deployment includes the Operating System Services scheduling server, so it appears automatically as the default.

The Server Manager plug-in to SAS Management Console identifies the scheduling servers that are included in your deployment. You can specify a different scheduling server, such as Platform Suite for SAS server, if your deployment includes it.

Note:  The Distributed In-process scheduling server is not supported.

Any scheduling preferences that you change are used the next time you create and schedule a query. If you need to change the settings for a query that is already scheduled, you can use SAS Management Console Schedule Manager to redeploy the deployed job for the query.

When you schedule a query, the SAS statements are saved in a file in the default deployment directory path: SAS-config-dir/Lev1/SASApp/SASEnvironment/SASCode/Jobs.

In the examples in this blog, the SAS-config-dir is /opt/sasinside/vaconfig.
The metadata name of the directory is Batch Jobs.
The default SAS Application Server name associated with the directory is SASApp.

If you are working in a VA environment where multiple application servers are defined, you should be aware of the following SAS Notes at the links below, relating to the application’s choice of application servers for scheduling.

SAS Note 58186SAS® Visual Data Builder might use the wrong application server for scheduling
SAS Note 52977SAS® Visual Data Builder requires the default SAS® Application Server and the default scheduling servers to be located on the same physical machine

To schedule a query, open the query and select the Schedule (clock) icon. (The clock is grayed out if you have not saved the query.)

You can schedule the query to run immediately (Run now) or at a specified time event.  To define a time event, select the Select one or more triggers for this query button and click New Time Event.  Grouping events are not supported for the default server, but may be supported for other scheduling servers, such as Platform Suite.

You can schedule for One time only, or More than once, running Hourly, Daily, Weekly, Monthly, or Yearly.  The appearance of the interface and scheduling parameters change with your specification.

In this example, a One time only event is specified.

 

The time event specification gets recorded in the Trigger list on the Schedule page, and is selected in the Used column.

After you click OK in the Schedule window, you will get the confirmation below.

After the time event has passed, you can verify that the table has been loaded on the LASR Tables tab of the Visual Analytics Administrator.

When you schedule, the Visual Data Builder:

  • creates a job that executes the query.
  • creates a deployed job from the job.
  • places the job into a new deployed flow.
  • schedules the flow on a scheduling server.

The files are named according to vdb_query_id_timestamp.
In this example the files are named vdb_CustomerInfoData_1490112883364_timestamp.
When the query executes at the scheduled time, the SAS code that is written to the /opt/sasinside/vaconfig/Lev1/SASApp/SASEnvironment/SASCode/Jobs directory.  The query is run with the user ID that scheduled it.

If you right-click on Server Manager in SAS Management Console and view Deployment Directories, you will see that this is the Deployment directory (Batch Jobs) for SASApp.

In the /opt/sasinside/vaconfig/Lev1/SASApp/BatchServer/Logs directory, you can view the SAS Log.

The scheduling server script and log are in /opt/sasinside/vaconfig/Lev1/SchedulingServer/Ahmed/vdb_CustomerInfoData_14900112883364

Observe that the script was written to this location at the time the job was scheduled, rather than at execution time.

If you edit a data query that is already scheduled, you must click the schedule icon again so that the SAS statements for the data query are regenerated and saved.

If you edit the query again and specify additional time events, each event appears in the trigger list, and you can check which time event is to be used for scheduling.

If scheduling a query according to time events, you should also be aware of this Usage note:

Usage Note 55880: Scheduled SAS® Visual Data Builder queries are executed based on the time zone of the scheduling server 

And to add to the fun, also keep in mind that if your deployment includes SAS Data Integration Studio, you can also export a query as a Job and then perform the deployment steps using DI Studio.

Just right-click on the query in the SAS folder panel in Visual Data Builder and Select Export as a Job!

Easy Scheduling in Visual Data Builder - SAS Visual Analytics 7.3 was published on SAS Users.

4月 052017
 

Emma Warrillow, President of Data Insight Group, Inc., believes analysts add business value when they ask questions of the business, the data and the approach. “Don’t be an order taker,” she said.

Emma Warrillow at SAS Global Forum.

Warrillow held to her promise that attendees wouldn’t see a stitch of SAS programming code in her session Monday, April 3, at SAS Global Forum.

Not that she doesn’t believe programming skills and SAS Certifications aren’t important. She does.

Why you need communication skills

But Warrillow believes that as technology takes on more of the heavy lifting from the analysis side, communication skills, interpretation skills and storytelling skills are quickly becoming the data analyst’s magic wand.

Warrillow likened it to the centuries-old question: If a tree falls in a forest, and no one is around to hear it, did it make a sound? “If you have a great analysis, but no one gets it or takes action, was it really a great analysis?” she asked.


If you have a great analysis, but no one gets it or takes action, was it really a great analysis?
Click To Tweet


To create real business value and be the unicorn – that rare breed of marketing technologist who understands both marketing and marketing technology – analysts have to understand the business and its goals and operations.

She offered several actionable tips to help make the transition, including:

1. Never just send the spreadsheet.

Or the PowerPoint or the email. “The recipient might ignore it, get frustrated or, worse yet, misinterpret it,” she said. “Instead, communicate what you’ve seen in the analysis.”

2. Be a POET.

Warrillow is a huge fan of the work of Laura Warren of Storylytics.ca. who recommends an acronym approach to data-based storytelling and making sure every presentation offers:

  • Purpose: The purpose of this chart is to …
  • Observation: To illustrate that …
  • Explanation: What this means to us is …
  • Take-away or Transition: As a next step, we recommend …

3. Brand your work.

“Many of us suffer from a lack of profile in our organizations,” she said. “Take a lesson from public relations and brand yourselves. Just make sure you’re a brand people can trust. Have checks and balances in place to make sure your data is accurate.”

4. Don’t be an order taker.

Be consultative and remember that you are the expert when it comes to knowing how to structure the campaign modeling. It can be tough in some organizations, Warrillow admitted, but asking some questions and offering suggestions can be a great way to begin.

5. Tell the truth.

“Storytelling can be associated with big, tall tales,” she said. “You have to have stories that are compelling but also have truth and resonance.” One of her best resources is The Four Truths of the Storyteller” by Peter Gruber, which first appeared in Harvard Business Review December 2007.

6. Go higher.

Knowledge and comprehension are important, “but we need to start moving further up the chain,” Warrillow said. She used Bloom’s Taxonomy to describe the importance of making data move at the speed of business – getting people to take action by moving into application, analysis, synthesis and evaluation phases.

7. Prepare for the future.

“Don’t become the person who says, ‘I’m this kind of analyst,’” she said. “We need to explore new environments, prepare ourselves with great skills. In the short term, we’re going to need more programming skills. Over time, however, we’re going to need interpretation, communication and storytelling skills.” She encouraged attendees to answer the SAS Global Forum challenge of becoming a #LifeLearner.

For more from Warrillow, read the post, Making data personal: big data made small.

7 tips for becoming a data science unicorn was published on SAS Users.

4月 032017
 

At Opening Session, SAS CEO Jim Goodnight and Alexa have a chat using the Amazon Echo and SAS Visual Analytics.

Unable to attend SAS Global Forum 2017 happening now in Orlando? We’ve got you covered! You can view live stream video from the conference, and check back here for important news from the conference, starting with the highlights from last night’s Opening Session.

While the location and record attendance made for a full house this year, CEO Jim Goodnight explained that there couldn’t be a more perfect setting to celebrate innovation than the world of Walt Disney. “Walt was a master innovator, combining art and science to create an entirely new way to make intelligent connections,” said Goodnight. “SAS is busy making another kind of intelligent connection – the kind made possible by data and analytics.”

It’s SAS’ mission to bring analytics everywhere and to make it ambient. That was exactly the motivation that drove SAS nearly four years ago when embarking on a massive undertaking known as SAS® Viya™. But SAS Viya – announced last year in Las Vegas – is more than just a fast, powerful, modernized analytics platform. Goodnight said it’s really the perfect marriage of science and art.

“Consider what would be possible if analytics could be brought into every moment and every place that data exists,” said Goodnight. “The opportunities are enormous, and like Walt Disney, it’s kind of fun to do the impossible.”

Driving an analytics economy

Executive Vice President and Chief Marketing Officer Randy Guard took the stage to update attendees on new releases available on SAS Viya and why SAS is so excited about it. And he explained the reason for SAS Viya comes from the changes being driven in the analytics marketplace. It’s what Guard referred to as an analytics economy – where the maturity of algorithms and techniques progress rapidly. “This is a place where disruption is normal, a place where you want to be the disruptor; you want to be the innovator,” said Guard. That’s exactly what you can achieve with SAS Viya.

As if SAS Viya didn’t leave enough of an impression, Guard took it one step further by inviting Goodnight back on stage to give users a preview into the newest innovation SAS has been cooking up. Using the Amazon Echo Dot – better known as Alexa – Goodnight put cognitive computing into action as he called up annual sales, forecasts and customer satisfaction reports in SAS® Visual Analytics.

Though still in its infant stages of development, the demo was just another reminder that when it comes to analytics, SAS never stops thinking of the next great thing.

AI: The illusion of intelligence

On his Segway, Executive Vice President and Chief Technology Officer Oliver Schabenberger talks AI at the SAS Global Forum Opening Session.

With his Segway Mini, Executive Vice President and Chief Technology Officer Oliver Schabenberger rolled on stage, fully trusting that his “smart legs” wouldn’t drive him off and into the audience. “I’ve accepted that algorithms and software have intelligence; I’ve accepted that they make decisions for us, but we still have choices,” said Schabenberger.

Diving into artificial intelligence, he explained that today’s algorithms operate with super-human abilities – they are reliable, repeatable and work around the clock without fatigue – yet they don’t behave like humans. And while the “AI” label is becoming trendy, true systems deserving of the AI title have two distinct things in common: they belong to the class of weak AI systems and they tend to be based on deep learning.

So, why are those distinctions important? Schabenberger explained that a weak AI system is trained to do one task only – the system driving an autonomous vehicle cannot operate the lighting in your home.

“SAS is very much engaged in weak AI, building cognitive systems into our software,” he said. “We are embedding learning and gamification into solutions and you can apply deep learning to text, images and time series.” Those cognitive systems are built into SAS Viya. And while they are powerful and great when they work, Schabenberger begged the question of whether or not they are truly intelligent.

Think about it. True intelligence requires some form of creativity, innovation and independent problem solving. The reality is, that today’s algorithms and software, no matter how smart, are being used as decision support systems to augment our own capabilities and make us better.

But it’s uncomfortable to think about fully trusting technology to make decisions on our behalf. “We make decisions based on reason, we use gut feeling and make split-second judgment calls based on incomplete information,” said Schabenberger. “How well do we expect machines to perform [in our place]when we let them loose and how quickly do we expect them to learn on the job?”

It’s those kinds of questions that prove that all we can handle today is the illusion of intelligence. “We want to get tricked by the machine in a clever way,” said Schabenberger. “The rest is just hype.”

Creating tomorrow‘s analytics leaders

With a room full of analytics leaders, Vice President of Sales Emily Baranello asked attendees to consider where the future leaders of analytics will come from. If you ask SAS, talent will be pulled from universities globally that have partnered with SAS to create 200 types of programs that teach today’s students how to work in SAS software. The commitment level to train up future leaders is evident and can be seen in SAS certifications, joint certificate programs and SAS’ track toward nearly 1 million downloads of SAS® Analytics U.

“SAS talent is continuing to building in the marketplace,” said Baranello. “Our goal is to bring analytics everywhere and we will continue to partner with universities to ready those students to be your successful employees.”

Using data for good

More than just analytics and technology, SAS’ brand is a representation of people who make the world a better place. Knowing that, SAS announced the development of GatherIQ – a customized crowdsourcing app that will begin with two International Organization for Migration (IMO) projects. One project will specifically focus on global migration, using data to keep migrants safe as they search for a better life. With GatherIQ, changing the world might be as easy as opening an app.

There's much more to come, so stay tuned to SAS blogs this week for the latest updates from SAS Global Forum!

SAS Viya, AI star at SAS Global Forum Opening Session was published on SAS Users.

4月 012017
 

Solar farm on SAS campus The full text of Fermat's statement, written in Latin, reads "Cubum autem in duos cubos, aut quadrato-quadratum in duos quadrato-quadratos, et generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet."

The English translation is: "It is impossible for a cube to be the sum of two cubes, a fourth power to be the sum of two fourth powers, or in general for any number that is a power greater than the second to be the sum of two like powers. I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain."

Here at SAS, we don’t take challenges lightly. After a short but intensive brainstorming, we came up with a creative and powerful SAS code that effectively proves this long-standing theorem. And it is so simple and short that not only can it be written on the margins of this blog, it can be tweeted!

Drum roll, please!

Here is the SAS code:

data _null_; 
	do n=3 by 1; 
		do a=1 by 1; 
			do b=1 by 1; 
				do c=1 by 1; 
					e = a**n + b**n = c**n;
					if e then stop; 
				end;
			end; 
		end;
	end;
run;

Or written compactly, without unnecessary spaces:

data _null_;do n=3 by 1;do a=1 by 1;do b=1 by 1;do c=1 by 1;e=a**n+b**n=c**n;if e then stop;end;end;end;end;run;

which is exactly 112 character long – well below the Twitter 140-character threshold.

Don’t be fooled by the utter simplicity and seeming unfeasibility of this code.  For the naysayers, let me clarify that we run this code in a distributed multithreaded environment where each do-loop runs as a separate thread.

We also use some creative coding techniques:

1.     Do-loop with just two options, count= and by=, but without the to= option (e.g. do c=1 by 1;). It is a valid syntax in SAS and serves the purpose of creating infinite loops when they are necessary (like in this case). You can easily test it by running the following SAS code snippet:

data _null_;
	start = datetime();
	do i=1 by 1;
		if intck('sec',start,datetime()) ge 20 then leave;
	end;
run;

The if-statement here is added solely for the purpose of specifying a wait time (e.g. 20) sufficient for persuading you in the loop’s infiniteness. Skeptics may increase this number to their comfort level or even remove (or comment out) the if-statement and enjoy the unconstrained eternity.

2.     Expression with two “=” signs in it (e.g. e = a**n + b**n = c**n;) Again, this is a perfectly valid expression in SAS and serves the purpose of assigning a variable the value of 0 or 1 resulting from a logical comparison operation. This expression can be rewritten as

e = a**n + b**n eq c**n;

or even more explicitly as

e = (a**n + b**n eq c**n);

As long as the code runs, the theorem is considered proven. If it stops, then the theorem is false.

You can try running this code on your hardware, at your own risk, of course.

We have a dedicated 128-processor UNIX server powered by an on-campus solar farm that has been autonomously running the above code for 40 years now, and there was not a single instance when it stopped running. Except pausing for the scheduled maintenance and equipment replacements.

During the course of this historic experience, we have accumulated an unprecedented amount of big data (all in-memory), converted it into event stream processing, and become a leader in data mining and business analytics.

This leads us to the following scientific conclusion: whether you are a pure mathematician or an empiricist, you can rest assured that Fermat's Last Theorem has been proven with a probability asymptotic to 1 beyond a reasonable doubt.

Have a happy 91-st day of the year 2017!

 

SAS code to prove Fermat's Last Theorem was published on SAS Users.

3月 292017
 

There is a well-known Russian saying that goes “Если нельзя, но очень хочется, то можно.” The English translation of it can span anywhere from “If you can’t, but want it badly, then you can” to “If you shouldn’t, but want it badly, then you should” to “If you may not, but want it badly, then you may.” Depending on your situation, any possible combination of “may,” “can,” or “should” may apply. You can even replace “want” with “need” to get a slightly different flavor.

There are known means of modifying variable attributes with PROC DATASETS, but they are limited to variable name, format, informat, and label. But what if we want/need to modify a variable length, or change a variable type? And I am not talking about creating a new variable with a different length or converting a numeric variable value into a character value of another variable. We want to change variable type and/or variable length in place, without adding new variables. If you believe it can’t be done, read the first paragraph again.

Imagine that we have two data tables that we need to concatenate into one table. However, there is one common variable that is of different type in each table – in the first table it is numeric, but in the second table it is character.

Sample data

Let’s create some sample data to emulate our situation by running the following SAS code:

libname sasdl 'C:PROJECTS_BLOG_SASchanging_variable_type_and_length_in_sas_datasets';
 
/* create study2016 data table */
data sasdl.study2016;
	length subjid dob 8 state $2 start_date end_date 8;
	infile datalines truncover;
	input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.;
	format dob start_date end_date mmddyy10.;
	datalines;
123456 08/15/1960 MD 01/02/2016
234567 11/13/1970 AL 05/12/2016 12/30/2016
;
 
/* create study2017 data table */
data sasdl.study2017;
	length subjid $6 dob 4 state $2 start_date end_date 4;
	infile datalines truncover;
	input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.;
	format dob start_date end_date mmddyy10.;
	datalines;
987654 03/15/1980 VA 02/13/2017
876543 11/13/1970 NC 01/11/2017 01/30/2017
765432 12/15/1990 NY 03/14/2017
;

The produced data tables will look as follows:

Table STUDY2016:
SAS data table 1

Table STUDY2017:
SAS data table 2

If we look at the tables’ variable properties, we will see that the subjid variable is of different type in these two data tables: it is of type Numeric (length of 8) in STUDY2016 and of type Character (length of 6) in STUDY2017:

SAS variables properties

Also, notice that variables dob, start_date, and end_date, although of the same Numeric type, have different length attributes - 8 in the STUDY2016 table, and 4 in the STUDY2017 table.

Data table concatenating problem

If we try to concatenate these two tables using PROC APPEND, SAS will generate an ERROR:

proc append base=sasdl.study2016 data=sasdl.study2017;
run;
 
NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016.
WARNING: Variable subjid not appended because of type mismatch.
WARNING: Variable dob has different lengths on BASE and DATA
         files (BASE 8 DATA 4).
WARNING: Variable start_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
WARNING: Variable end_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
ERROR: No appending done because of anomalies listed above.
       Use FORCE option to append these files.
NOTE: 0 observations added.

Even if we do use the FORCE option as the ERROR message suggests, the result will be disappointing:

proc append base=sasdl.study2016 data=sasdl.study2017 force;
run;
 
NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016.
WARNING: Variable subjid not appended because of type mismatch.
WARNING: Variable dob has different lengths on BASE and DATA
         files (BASE 8 DATA 4).
WARNING: Variable start_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
WARNING: Variable end_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
NOTE: FORCE is specified, so dropping/truncating will occur.

The resulting data table will have missing values for the appended subjid:

Missing values in SAS table after PROC APPEND

Solution

In order to concatenate these tables, we must make the mismatching variable subjid of the same type in both data tables, either Character or Numeric. Making them both of Character type seems more robust, since it would allow for the values to contain both digit and non-digit characters. But if you know for sure that the value contains only digits, making them both Numeric works just as well.

Let’s say we decide to make them of Character type. Also note that our numeric variables representing dates (dob, start_date and end_date) are of different lengths: they are length 8 in STUDY2016 and length 4 in STUDY2017. Let’s make them the same length as well. From the standpoint of numerical accuracy in SAS for the dates, a length of 4 seems to be quite adequate to represent them accurately.

Let’s apply all our modifications to the STUDY 2016 dataset. Even though we are going to re-build the dataset in order to modify variable type and length, we are going to preserve the variable order so it feels like we just modified those variable attributes.

Here is how it can be done.

/* create macrovariable varlist containing a list of variable names */
proc sql noprint;
	select name into :varlist separated by ' '
	from sashelp.vcolumn
	where upcase(libname) eq 'SASDL' and upcase(memname) eq 'STUDY2016';
quit;
 
/* modify variable type and length */
data sasdl.study2016 (drop=v1-v4);
	retain &varlist; *<-- preserve variable order ;
	length subjid $6 dob start_date end_date 4; *<-- define new types/lengths ;
	format dob start_date end_date mmddyy10.;   *<-- recreate formats ;
	set sasdl.study2016 (rename=(subjid=v1 dob=v2 start_date=v3 end_date=v4));
	subjid = put(v1,6.); *<-- redefine subjid variable ;
	dob = v2;            *<-- redefine dob variable ;
	start_date = v3;     *<-- redefine start_date variable ;
	end_date = v4;       *<-- redefine end_date variable ;
run;
 
/* make sure new concatenated file (study_all) does not exist */
proc datasets library=sasdl nolist;
	delete study_all;
quit;
 
/* append both (study2016 and study2017) to study_all */
proc append base=sasdl.study_all data=sasdl.study2016;
run;
proc append base=sasdl.study_all data=sasdl.study2017;
run;

In this code, first, using proc sql and SAS view sashelp.vcolumn, we create a macro variable varlist to hold the list of all the variable names in our table, sasdl.study2016.

Then in the data step, we use a retain statement to preserve the variable order. When we read the sasdl.study2016 dataset using the set statement, we rename our variables-to-be-modified to some temporary names (e.g. v1 – v4) which we eventually drop in the data statement.

Then we re-assign the values of those temporary variables to the original variable names, thereby essentially creating new variables with new type and length. Since these new variables are named exactly as the old ones, the effect is as if their type and length attributes where modified, while in fact the whole table was rebuilt and replaced. Problem solved.

When we concatenate the data tables we create a new table sasdl.study_all. Before concatenating our two tables using proc append twice, we use proc datasets to delete that new table first. Even if the table does not exist, proc datasets will at least attempt to delete it. With all the seeming redundancy of this step, you will definitely appreciate it when you try running this code more than one time.

Changing variable type and variable length in SAS datasets was published on SAS Users.

3月 282017
 

I have been using the SAS Viya environment for just over six months now and I absolutely love it.  As a long-time SAS coder and data scientist I’m thrilled with the speed and greater accuracy I’m getting out of a lot of the same statistical techniques I once used in SAS9.  So why would a data scientist want to switch over to the new SAS Viya platform? The simple response is “better, faster answers.”  There are some features that are endemic to the SAS Viya architecture that provide advantages, and there are also benefits specific to different products as well.  So, let me try to distinguish between these.

SAS Viya Platform Advantages

To begin, I want to talk about the SAS Viya platform advantages.  For data processing, SAS Viya uses something called the CAS (Cloud Analytic Services) server – which takes the place of the SAS9 workspace server.  You can still use your SAS9 installation, as SAS has made it easy to work between SAS9 and SAS Viya using SAS/CONNECT, a feature that will be automated later in 2017.

Parallel Data Loads

One thing I immediately noticed was the speed with which large data sets are loaded into SAS Viya memory.  Using Hadoop, we can stage input files in either HDFS or Hive, and CAS will lift that data in parallel into its pooled memory area.  The same data conversion is occurring, like what happened in SAS9, but now all available processors can be applied to load the input data simultaneously.  And speaking of RAM, not all of the data needs to fit exactly into memory as it did with the LASR and HPA procedures, so much larger data sets can be processed in SAS Viya than you might have been able to handle before.

Multi-threaded DATA step

After initially loading data into SAS Viya, I was pleased to learn that the SAS DATA step is multi-threaded.  Most of your SAS9 programs will run ‘as is,’ however the multi-processing really only kicks in when the system finds explicit BY statements or partition statements in the DATA step code.  Surprisingly, you no longer need to sort your data before using BY statements in Procs or DATA steps.  That’s because there is no Proc Sort anymore – sorting is a thing of the past and certainly takes some getting used to in SAS Viya.  So for all of those times where I had to first sort data before I could use it, and then execute one or more DATA steps, that all transforms into a more simplified code stream.   Steven Sober has some excellent code examples of the DATA step running in full-distributed mode in his recent article.

Open API’s

While all of SAS Viya’s graphical user interfaces are designed with consistency of look and feel in mind, the R&D teams have designed it to allow almost any front-end or REST service submit commands and receive results from either CAS or its corresponding micro-service architecture.  Something new I had to learn was the concept of a CAS action set.  CAS action sets are comprised of a number of separate actions which can be executed singly or with other actions belonging to the same set.  The cool thing about CAS actions is that there is one for almost any task you can think about doing (kind of like a blend between functions and Procs in SAS9).  In fact, all of the visual interfaces SAS deploys utilize CAS actions behind the scenes and most GUI’s will automatically generate code for you if you do not want to write it.

But the real beauty of CAS actions is that you can submit them through different coding interfaces using the open Application Programming Interface’s (API’s) that SAS has written to support external languages like Python, Java, Lua and R (check out Github on this topic).  The standardization aspect of using the same CAS action within any type of external interface looks like it will pay huge dividends to anyone investing in this approach.

Write it once, re-use it elsewhere

I think another feature that old and new users alike will adore is the “write-it-once, re-use it” paradigm that CAS actions support.  Here’s an example of code that was used in Proc CAS, and then used in Jupyter notebook using Python, followed by a R/REST example.

Proc CAS

proc cas;
dnnTrain / table={name  = 'iris_with_folds'
                   where = '_fold_ ne 19'}
 modelWeights = {name='dl1_weights', replace=true}
 target = "species"
 hiddens = {10, 10} acts={'tanh', 'tanh'}
 sgdopts = {miniBatchSize=5, learningRate=0.1, 
                  maxEpochs=10};
run;

 

Python API

s.dnntrain(table = {‘name’: 'iris_with_folds’,
                                  ‘where’: '_fold_ ne 19},
   modelweights = {‘name’: 'dl1_weights', ‘replace’: True}
   target  = "species"
   hiddens  = [10, 10], acts=['tanh', ‘tanh']
   sgdopts  = {‘miniBatchSize’: 5, ‘learningRate’: 0.1, 
                      ‘maxEpochs’: 10})

 

R API

cas.deepNeural.dnnTrain(s,
  table = list(name = 'iris_with_folds’
                   where = '_fold_ ne 19’),
  modelweights = list({name='dl1_weights', replace=T),
  target = "species"
  hiddens = c(10, 10), acts=c('tanh', ‘tanh‘)
  sgdopts = list(miniBatchSize = 5, learningRate = 0.1,
                   maxEpochs=10))

 

See how nearly identical each of these three are to one another?  That is the beauty of SAS Viya.  Using a coding approach like this means that I do not need to rely exclusively on finding SAS coding talent anymore.  Younger coders who usually know several open source languages take one look at this, understand it, and can easily incorporate it into what they are already doing.  In other words, they can stay in coding environments that are familiar to them, whilst learning a few new SAS Viya objects that dramatically extend and improve their work.

Analytics Procedure Advantages

Auto-tuning

Next, I want address some of the advantages in the newer analytics procedures.  One really great new capability that has been added is the auto-tuning feature for some machine learning modeling techniques, specifically (extreme) gradient boosting, decision tree, random forest, support vector machine, factorization machine and neural network.  This capability is something that is hard to find in the open source community, namely the automatic tuning of major option settings required by most iterative machine learning techniques.  Called ‘hyperspace parameters’ among data scientists, SAS has built-in optimizing routines that try different settings and pick the best ones for you (in parallel!!!).  The process takes longer to run initially, but, wow, the increase in accuracy without going through the normal model build trial-and-error process is worth it for this amazing feature!

Extreme Gradient Boosting, plus other popular ML techniques

Admittedly, xgboost has been in the open source community for a couple of years already, but SAS Viya has its own extreme[1] gradient boosting CAS action (‘gbtreetrain’) and accompanying procedure (Gradboost).  Both are very close to what Chen (2015, 2016) originally developed, yet have some nice enhancements sprinkled throughout.  One huge bonus is the auto-tuning feature I mentioned above.  Another set of enhancements include: 1) a more flexible tree-splitting methodology that is not limited to CART (binary tree-splitting), and 2) the handling of nominal input variables is done automatically for you, versus ‘one-hot-encoding’ you need to perform in most open source tools.  Plus, lots of other detailed option settings for fine tuning and control.

In SAS Viya, all of the popular machine learning techniques are there as well, and SAS makes it easy for you to explore your data, create your own model tournaments, and generate score code that is easy to deploy.  Model management is currently done through SAS9 (at least until the next SAS Viya release later this year), but good, solid links are provided between SAS Viya and SAS9 to make transferring tasks and output fairly seamless.  Check out the full list of SAS Viya analytics available as of March 2017.

In-memory forecasting

It is hard to beat SAS9 Forecast Server with its unique 12 patents for automatic diagnosing and generating forecasts, but now all of those industry-leading innovations are also available in SAS Viya’s distributed in-memory environment. And by leveraging SAS Viya’s optimized data shuffling routines, time series data does not need to be sorted, yet it is quickly and efficiently distributed across the shared memory array. The new architecture also has given us a set of new object packages to make more efficient use of the data and run faster than anything witnessed before. For example, we have seen 1.5 million weekly time series with three years of history take 130 hours (running single-machine and single-threaded) and reduce that down to run in 5 minutes on a 40 core networked array with 10 threads per core. Accurately forecasting 870 Gigabytes of information, in 5 minutes?!? That truly is amazing!

Conclusions

Though I first ventured into SAS Viya with some trepidation, it soon became clear that the new platform would fundamentally change how I build models and analytics.  In fact, the jumps in performance and the reductions in time spent to do routine work has been so compelling for me, I am having a hard time thinking about going back to a pure SAS9 environment.  For me it’s all about getting “better, faster answers,” and SAS Viya allows me to do just that.   Multi-threaded processing is the way of the future and I want to be part of that, not only for my own personal development, but also because it will help me achieve things for my customers they may not have thought possible before.  If you have not done so already, I highly recommend you to go out and sign up for a free trial and check out the benefits of SAS Viya for yourself.


[1] The definition of ‘extreme’ refers only to the distributed, multi-threaded aspect of any boosting technique.

References

Chen , Tianqi and Carlos Guestrin , “XGBoost: Reliable Large-scale Tree Boosting System”, 2015

Chen , Tianqi and Carlos Guestrin, “XGBoost: A Scalable Tree Boosting System”, 2016

Using the Robust Analytics Environment of SAS Viya was published on SAS Users.

3月 222017
 

Editor's note: The following post is from Scott Leslie, PhD, Manager of Advanced Analytics for MedImpact Healthcare Systems, Inc. Scott will be one of the Code Doctors at SAS Global Forum 2017.

Learn more about Scott.

VISIT THE CODE CLINIC AT SASGF 2017

$0 copay, no deductible.  No waiting rooms, no outdated magazines. What kind of doctor’s office is this? While we might not be able to help with that nasty cough, SAS Code Doctors are here to help – when it comes to your SAS code, that is.

Yes, the Code Doctors return to SAS Global Forum 2017! This year the Code Clinic will have over 20 SAS experts on-call to answer your questions on syntax, SAS Solutions, best practices and concepts across a broad range of SAS topics/applications, including Base SAS, macros, report writing, ODS, SQL, SAS Enterprise Guide, statistics, and more. It’s a fantastic opportunity to review code, ask questions, develop and brainstorm with peers who have decades of experience using SAS. Bring your code on paper, a flash drive, or a laptop. We’ll have 3-4 laptops with several versions of SAS software installed: 9.1.3 to 9.4 and EG 4.1 to 7.1. And if we can’t answer your coding question at the clinic, we can easily refer you to a specialist, namely the SAS R&D section of the Quad.

So, take advantage of this personalized learning experience in the Lower Quad area of the conference. Clinic office hours are:

  • Monday 4/3, 10:00 am - 3:30 pm
  • Tuesday 4/3, 9:30 am – 2:00 pm and 3:30 pm – 6:00 pm

Here’s the detailed schedule of our all-star code doctor lineup. If you haven’t heard of these names yet, you have now...

/*Just by reading this blog…*/.

 

About Scott Leslie

Scott Leslie, PhD, is Manager of Advanced Analytics for MedImpact Healthcare Systems, Inc. with over15 years of SAS® experience in the pharmacy benefits and medical management field. His SAS knowledge areas include SAS/STAT, Enterprise Guide, and Visual Analytics. Scott presents at local, regional and international SAS user group conferences as well at various clinical and scientific conferences. He is a former executive committee member of the Western Users of SAS Software (WUSS) and contributes to the San Diego SAS Users’ Group (SANDS).

Visit the code clinic at SAS Global Forum was published on SAS Users.