tips & techniques

5月 142019
 

Interested in making business decisions with big data analytics? Our Wiley SAS Business Series book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value by Bart Baesens, Wouter Verbeke, and Cristian Danilo Bravo Roman has just the information you need to learn how to use SAS to make data and analytics decision-making a part of your core business model!

This book combines the authorial team’s worldwide consulting experience and high-quality research to open up a road map to handling data, optimizing data analytics for specific companies, and continuously evaluating and improving the entire process.

In the following excerpt from their book, the authors describe a value-centric strategy for using analytics to heighten the accuracy of your enterprise decisions:

“'Data is the new oil' is a popular quote pinpointing the increasing value of data and — to our liking — accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use.”

Analytics process model

In our book, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient.

So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.

Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about.

Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to — and analysis of — information to improve and optimize decisions and performance.

This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.

    Step 1
    As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.

    Step 2
    Next, all source data that could be of potential interest need to be identified. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using, for instance, OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing).

    Step 3
    After we move to the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist.

    Step 4
    Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model is interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities!

    Step 5
    Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and back-tested on an ongoing basis.

Book giveaway!

If you are as excited about business analytics as we are and want a copy of Bart Baesens’ book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value, enter to win a free copy in our book giveaway today! The first 5 commenters to correctly answer the question below get a free copy of Baesens book! Winners will be contacted via email.

Here's the question:
What Free SAS Press e-book did Bart Baesens write the foreword too?

We look forward to your answers!

Further resources

Want to prove your business analytics skills to the world? Check out our Statistical Business Analyst Using SAS 9 certification guide by Joni Shreve and Donna Dea Holland! This certification is designed for SAS professionals who use SAS/STAT software to conduct and interpret complex statistical data analysis.

For more information about the certification and certification prep guide, watch this video from co-author Joni Shreve on their SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9.

Big data in business analytics: Talking about the analytics process model was published on SAS Users.

5月 142019
 

Interested in making business decisions with big data analytics? Our Wiley SAS Business Series book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value by Bart Baesens, Wouter Verbeke, and Cristian Danilo Bravo Roman has just the information you need to learn how to use SAS to make data and analytics decision-making a part of your core business model!

This book combines the authorial team’s worldwide consulting experience and high-quality research to open up a road map to handling data, optimizing data analytics for specific companies, and continuously evaluating and improving the entire process.

In the following excerpt from their book, the authors describe a value-centric strategy for using analytics to heighten the accuracy of your enterprise decisions:

“'Data is the new oil' is a popular quote pinpointing the increasing value of data and — to our liking — accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use.”

Analytics process model

In our book, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient.

So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.

Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about.

Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to — and analysis of — information to improve and optimize decisions and performance.

This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.

    Step 1
    As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.

    Step 2
    Next, all source data that could be of potential interest need to be identified. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using, for instance, OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing).

    Step 3
    After we move to the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist.

    Step 4
    Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model is interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities!

    Step 5
    Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and back-tested on an ongoing basis.

Book giveaway!

If you are as excited about business analytics as we are and want a copy of Bart Baesens’ book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value, enter to win a free copy in our book giveaway today! The first 5 commenters to correctly answer the question below get a free copy of Baesens book! Winners will be contacted via email.

Here's the question:
What Free SAS Press e-book did Bart Baesens write the foreword too?

We look forward to your answers!

Further resources

Want to prove your business analytics skills to the world? Check out our Statistical Business Analyst Using SAS 9 certification guide by Joni Shreve and Donna Dea Holland! This certification is designed for SAS professionals who use SAS/STAT software to conduct and interpret complex statistical data analysis.

For more information about the certification and certification prep guide, watch this video from co-author Joni Shreve on their SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9.

Big data in business analytics: Talking about the analytics process model was published on SAS Users.

5月 102019
 

May 12th is #NationalLimerickDay! If you saw our Valentine’s Day poem, you know we at SAS Press love creating poems and fun rhymes, so check out our limericks below!

So, what’s a limerick?

National Limerick Day is observed each year on May 12th and honors the birthday of the famed English artist, illustrator, author and poet Edward Lear (May 12, 1812 – Jan. 29, 1888). Lear’s poetry is most famous for its nonsense or absurdity, and mostly consists of prose and limericks.

His book, “Book of Nonsense,” published in 1846 popularized the limerick poem.

A limerick poem has five lines and is often very short, humorous, and full of nonsense. To create a limerick the first two lines must rhyme with the fifth line, and the third and fourth lines rhyme together. The limerick’s rhythm is officially described as anapestic meter.

To celebrate, we want to ask all lovers of SAS books to enjoy the limericks written by us and to see if you can create your own! Can you top our limericks on our love for SAS Books? Check out our handy how-to limerick links below.

Our limericks

There once was a software named SAS
helping tons of analysts complete tasks.
a Text Analytics book to extract meaning as data flies by
and a Portfolio and Investment Analysis book so you’ll never go awry.
You know our SAS books are first-class!

We enjoyed meeting our awesome users at SAS Global Forum
who enjoy our books with true decorum.
a SAS Administration book on building from the ground up
and a new book about PROC SQL you need to pick-up.
Checkout our SAS books today, you’ll adore ‘em!

For more about SAS Books and some more of our SAS Press fun, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also check out all our new SAS books at our online bookstore.

Resources:
Wiki-How: How to Write A Limerick
Limerick Generator: Create a Limerick in Seconds

Happy National Limerick Day from SAS Press! was published on SAS Users.

5月 102019
 

May 12th is #NationalLimerickDay! If you saw our Valentine’s Day poem, you know we at SAS Press love creating poems and fun rhymes, so check out our limericks below!

So, what’s a limerick?

National Limerick Day is observed each year on May 12th and honors the birthday of the famed English artist, illustrator, author and poet Edward Lear (May 12, 1812 – Jan. 29, 1888). Lear’s poetry is most famous for its nonsense or absurdity, and mostly consists of prose and limericks.

His book, “Book of Nonsense,” published in 1846 popularized the limerick poem.

A limerick poem has five lines and is often very short, humorous, and full of nonsense. To create a limerick the first two lines must rhyme with the fifth line, and the third and fourth lines rhyme together. The limerick’s rhythm is officially described as anapestic meter.

To celebrate, we want to ask all lovers of SAS books to enjoy the limericks written by us and to see if you can create your own! Can you top our limericks on our love for SAS Books? Check out our handy how-to limerick links below.

Our limericks

There once was a software named SAS
helping tons of analysts complete tasks.
a Text Analytics book to extract meaning as data flies by
and a Portfolio and Investment Analysis book so you’ll never go awry.
You know our SAS books are first-class!

We enjoyed meeting our awesome users at SAS Global Forum
who enjoy our books with true decorum.
a SAS Administration book on building from the ground up
and a new book about PROC SQL you need to pick-up.
Checkout our SAS books today, you’ll adore ‘em!

For more about SAS Books and some more of our SAS Press fun, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also check out all our new SAS books at our online bookstore.

Resources:
Wiki-How: How to Write A Limerick
Limerick Generator: Create a Limerick in Seconds

Happy National Limerick Day from SAS Press! was published on SAS Users.

5月 062019
 

App security is at the top of mind for just about everybody – users, IT folks, business executives. Rightfully so. Mobile apps and the devices on which they reside tend to travel around, without any physical boundaries that encompass the traditional desktop computers.

In chatting with folks who are evaluating the SAS Visual Analytics app for their mobile devices, the conversation eventually winds up with a focus on security and the big question comes up:

How is this app secure?

Great question! Here’s a whirlwind tour of the security features that have been built into the SAS Visual Analytics app for Windows 10, Android, and iOS devices. The app is now a young kid and not a toddler anymore, it has been around for about six years. And during its growth journey, the app has been beefed up with rock-solid features to address security for Visual Analytics reports viewed from mobile devices.

Before we take a look at the security features in the app, here are a few things you should know:

    • The app is free.
    • No license is needed to use the app.
    • You can download it anytime from the app store, and try out the sample reports in the app.
    • If you already have SAS Visual Analytics deployed in your organization, you can connect to your server, add reports to the app, and start interacting with your reports from your smartphone or tablet. The Help available in the app walks you through these steps.

Now, let’s get back to security for Visual Analytics reports on mobile devices. Here are five things that make the Visual Analytics app robust and secure on mobile devices.

    1. Device Whitelisting: If you want to connect to your SAS Visual Analytics server from the app, your administrator will “whitelist” your mobile device. Your device is first registered as a valid device that can connect to the Visual Analytics server. The whitelist affects devices, not users. If you happen to lose your mobile device, your administrator can remove the device from the whitelist and prevent access to the reports and data. The option to “blacklist” devices is also available.
    2. Cached Reports: After you add Visual Analytics reports to your app, if you don’t want the report data to remain with the report in the app, your administrator can enable the cached report feature. Data is downloaded only when you open and view the report on your mobile device. When you close the report, that data is removed from the device. For enhanced security, thumbnail images for report tiles in your app will not display for cached reports.
    3. Passcode: To prevent anyone other than yourself from opening the Visual Analytics app, you can set a 4-digit passcode for the app. There are two kinds of passcodes: required and optional. A required passcode is mandated by the server – when you connect to the server, you will create a passcode. Then, whenever you open the app or view a report from that server, you must enter the passcode. An optional passcode, on the other hand, is a passcode that you choose to use to lock up the app – it is not required to access the server, it is needed only to open the app. In addition, there are several features for passcode use that solidify security and access to the app: time-out, lock-out and so forth. I’ll go over these features in an upcoming blog.
    4. SSL/HTTPS: If the Visual Analytics server is set up with SSL/HTTPS, the data viewed in the reports on your mobile device is encrypted.
    5. Offline: If you were offline for a specified number of days, you must sign into the server again. If you don’t, the app does not download reports, update reports, or open reports for viewing.

Cached Reports

One of the security features we just talked about was the cached report feature. Here’s how cached report thumbnails are displayed in the Visual Analytics app on Windows 10, without any images.

When you tap the thumbnail for the cached report, data is immediately downloaded and the report opens in the app for viewing and interaction:

When you close this cached report in the app, the data is removed from the device and the cached report thumbnail displays in the app without any images.

Thanks for joining me on this whirlwind security tour of the SAS Visual Analytics app. Now you know the many different security mechanisms that are in place to protect your organization’s data and reports accessed from the mobile app.

Five key security features in the SAS Visual Analytics app was published on SAS Users.

4月 252019
 

As a SAS programmer, you are asked to do many things with your data -- reading, writing, calculating, building interfaces, and occasionally sending data outside of SAS. One of the most popular outputs you may be tasked with creating is likely a Microsoft Excel workbook. Have you ever heard, “just send me the spreadsheet”?

For an internal project the task is easy, just open the SAS ODS EXCEL destination, run PROC PRINT, and close SAS ODS EXCEL and the workbook spreadsheet is ready. But if the workbook or the spreadsheet is to be delivered somewhere else you may need to spruce it up a bit. Of course, you can manually change virtually everything on the spreadsheet, but that takes lots of employee time. And if the spreadsheet is delivered on a periodic basis, you may not run it the same every time.

Saving you time and money

Suppose you run a PROC PRINT with a “BY” statement and produce a Microsoft Excel workbook with 100 pages. If each of those pages need to be printed and distributed to 100 clients by mail, do you want to be the person who changes each of those to print as a landscape printout? The SAS ODS Excel destination has over 125 options and sub-options that can perform various tasks while the workbook is being written. One such task sets the worksheet to print in “landscape” format.

As a programmer, I know that when I want to start a new project or learn new software, I look to two places in a book: the index and the table of contents. If I can think of a key word that might help me, I look to the index. But when searching general topics, I use the Table of Contents (TOC). It always frustrates me if the TOC is in alphabetical order, so I decided to write my TOC as groups of options and SAS commands that impacted similar parts or features of the Excel Workbook.

To see this in action, the bullet points I have listed below identify the major topic sections of the book. These are, in fact, chapter titles presented after the introduction:

    • ODS Tagset versus Destination
    • ODS Excel Destination Actions
    • Setting Excel Document Property Values
    • Options That Affect the Workbook
    • Arguments that Affect Output Features
    • Options That Affect Worksheet Features
    • Options That Affect Print Features
    • Column, Row, and Cell Features

Take a look inside

Allow me to describe each of these topics in a few words:

    ODS Tagset versus Destination
    Many people have used the SAS ODS Tagset EXCELXP and will find many parts of the SAS ODS EXCEL destination to be very similar in both syntax and function. A tagset is Proc template code that can be changed by the user, while a SAS ODS destination is a built-in feature of SAS, much like a PROC or FUNCTION that cannot be changed by the users. This section also describes the ID feature of ODS which allows you to write more than one EXCEL workbook at a time.

    ODS Excel Destination Actions
    This area describes ODS features that may not be exclusive to ODS EXCEL but are useful in finding and choosing SAS outputs to be processed.

    Setting Excel Document Property Values
    Here you are shown how to change the comments, keywords, author, title, and other parts of the Excel Property sheet.

    Options That Affect the Workbook
    This section of the book shows you how to name the workbook, create blank worksheets, create a table of contents or index of worksheets within the workbook, change worksheet tab colors, and other options.

    Arguments that Affect Output Features
    The output features described here include changing the output style (coloration of the worksheet sections), finding and using stylesheet anchors, building and using Cascading Style Sheets, changing the Dots Per Inch (DPI) of the output data and or graphs, and adding text to the worksheet.

    Options That Affect Worksheet Features
    SAS has many options that describe the output data. These include titles, footnotes, and byline text. Additionally, SAS can group output worksheets in many ways including by page, by proc, by table, by by-group, or even no separation at all. Data can also be set to “FITTOPAGE” or the height or width can be selected along with adding sheet names or labels to the EXCEL output worksheets.

    Options That Affect Print Features
    Excel has many print features like printing in “black and white” only, centering horizontally or vertically, landscape or portrait, draft quality or standard, selecting the print order of the data, selecting the area to print, EXCEL headers and EXCEL footnotes, and others. All of which SAS can adjust as the workbook is being written.

    Column, Row, and Cell Features
    Finally, SAS can adjust column and row features like adding filters, changing the widths and heights or rows and columns, hiding rows or columns, inserting formulas, and even placing the data somewhere other than row one column one.

Ready to see the full Table of Contents? Click here.

Ready, set, go

My book Exchanging Data From SAS® to Excel: The ODS Excel Destination expands upon the SAS documentation by giving full descriptions and examples including SAS code and EXCEL output for nearly every option and sub-option of the SAS ODS EXCEL software. In addition to this blog, check out a free chapter of my book to get started making your worksheets beautifully formatted. Get ready to follow the money and make your reports come out perfect for publication in no time!

Making the most of the ODS Excel destination was published on SAS Users.

4月 222019
 

Imagine a world where satisfying human-computer dialogues exist. With the resurgence of interest in natural language processing (NLP) and understanding (NLU) – that day may not be far off.

In order to provide more satisfying interactions with machines, researchers are designing smart systems that use artificial intelligence (AI) to develop better understanding of human requests and intent.

Last year, OpenAI used a machine learning technique called reinforcement learning to teach agents to design their own language. The AI agents were given a simple set of words and the ability to communicate with each other. They were then given a set of goals that were best achieved by cooperating (communicating) with other agents. The agents independently developed a simple ‘grounded’ language.

Grounded vs. inferred language


Human language is said to be grounded in experience. People grasp the meaning of many basic words by interaction – not by learning dictionary definitions by rote. They develop understanding in terms of sensory experience -- for example, words like red, heavy, above.

Abstract word meanings are built in relation to more concretely grounded terms. Grounding allows humans to acquire and understand words and sentences in context.

The opposite of a grounded language is an inferred language. Inferred languages derive meaning from the words themselves and not what they represent. In AI trained only on textual data, but not real-world representations, these methods lack true understanding of what the words mean.

What if the AI agent develops its own language we can’t understand?

It happens. Even if the researcher gives the agents simple English words the agent inevitably diverges to its own, unintelligible language. Recently researchers at Facebook, Google and OpenAI all experienced this phenomenon!

Agents are reward driven. If there is no reward for using English (or human language) then the agents will develop a more efficient shorthand for themselves.

That’s cool – why is that a problem?

When researchers at the Facebook Artificial Intelligence Research lab designed chatbots to negotiate with one another using machine learning, they had to tweak one of their models because otherwise the bot-to-bot conversation “led to divergence from human language as the agents developed their own language for negotiating.” They had to use what’s called a fixed supervised model instead.

The problem, there, is transparency. Machine learning techniques such as deep learning are black box technologies. A lot of data is fed into the AI, in this case a neural network, to train on and develop its own rules. The model is then fed new data which is used to spit out answers or information. The black box analogy is used because it is very hard, if not impossible in complex models, to know exactly how the AI derives the output (answers). If AI develops its own languages when talking to other AI, the transparency problem compounds. How can we fully trust an AI when we can’t follow how it is making its decisions and what it is telling other AI?

But it does demonstrate how machines are redefining people’s understanding of so many realms once believed to be exclusively human—like language. The Facebook researchers concluded that it offered a fascinating insight to human and machine language. The bots also proved to be very good negotiators, developing intelligent negotiating strategies.

These new insights, in turn, lead to smarter chatbots that have a greater understanding of the real world and the context of human dialog.

At SAS, we’re developing different ways to incorporate chatbots into business dashboards or analytics platforms. These capabilities have the potential to expand the audience for analytics results and attract new and less technical users.

“Chatbots are a key technology that could allow people to consume analytics without realizing that’s what they’re doing,” says Oliver Schabenberger, SAS Executive Vice President, Chief Operating Officer and Chief Technology Officer in a recent SAS Insights article. “Chatbots create a humanlike interaction that makes results accessible to all.” The evolution of NLP toward NLU has a lot of important implications for businesses and consumers alike.

Satisfying human-computer dialogues will soon exist, and will have applications in medicine, law, and the classroom-to name but a few. As the volume of unstructured information continues to grow exponentially, we will benefit from AI’s tireless ability to help us make sense of it all.

Further Resources:
Natural Language Processing: What it is and why it matters
White paper: Text Analytics for Executives: What Can Text Analytics Do for Your Organization?
SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, by Teresa Jade, Biljana Belamaric Wilsey, and Michael Wallis
Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS®, by Matthew Windham
SAS: What are chatbots?
Blog: Let’s chat about chatbots, by Wayne Thompson

Moving from natural language processing to natural language understanding was published on SAS Users.

4月 082019
 


As word spreads that SAS integrates with open source technologies, people are beginning to explore how to connect, interact with, and use SAS in new ways. More and more users are examining the possibilities and with this comes questions like: How do I code A, integrate B, and accomplish C?

Documentation is plentiful but is undergoing a makeover. People aren’t sure where to go for help – and that's why we're launching the SAS Developers Community, where you can gather to ask questions and get answers.

The community will mirror the activities in existing SAS Communities: Q&A, library articles, tips, technical discussions, etc. We migrated some content from other boards. For example, we moved the content from the Coding on SAS Viya board to the new community. Additionally, we scoured other boards for content that may be better aligned with developers and moved it. We also created some original content. Any good community needs participation by all, so read on and get the 411 on the new Developers Community.

Who is the target audience?

Developers – data scientists, application developers, analysts, programmers and administrators – who need to access SAS resources and/or run SAS procedures. This audience may or may not have SAS programming skills but need to access and analyze data using SAS.

What can developers expect to find?

The Developers Community provides a forum for collaboration, Q&A, and knowledge and resource sharing. The focus will be on developers using open source languages and technology. The community will create synergy between communities.sas.com, developer.sas.com, and github.com/sassoftware. SAS employees and external users will post how-to articles and other items of interest in the library section of the community. This community will not replace the SAS Programming Communities, rather, it will fill a void for non-SAS programmers who have a need/desire to interact with SAS.

When will the community launch?

The Developers Community is live! The site is public, and we've moved existing artifacts to the community. I am attending SAS Global Forum and will be available to answer questions about the new community from our booth in the Quad. Come by and see me!

Where will the community live?

The Developers Community exists on communities.sas.com, under the Developers Category.

Why do we need a community for developers?

Developers need a centralized place to share ideas, ask and answer questions, and discover resources. Currently developers lack a forum to work through things such as authentication, coding, API use, and integration issues. The community will encourage communication, engagement and leadership. Also, the Developers Community will be tightly integrated with the SAS Developers web site and SAS GitHub resources.

How do we go about creating the community?

After seeding the SAS Developer Community with existing discussions, we'll build out a group of SAS developer experts to help monitor the community. The true magic will happen as questions are asked, discussions transpire, and ideas are shared. But we need to your help too. Here is your call to action.

Share the community with your networks, buddies and even family members who may get something out of chatting it up about how to develop in SAS. The livelihood of the community hinges on user interaction. Our current and future users will thank you for it. And you may make a friend while you're at it.

Launching the Developers Community in SAS Communities was published on SAS Users.

4月 012019
 

dividing by zero with SAS

Whether you are a strong believer in the power of dividing by zero, agnostic, undecided, a supporter, denier or anything in between and beyond, this blog post will bring all to a common denominator.

History of injustice

For how many years have you been told that you cannot divide by zero, that dividing by zero is not possible, not allowed, prohibited? Let me guess: it’s your age minus 7 (± 2).

But have you ever been bothered by that unfair restriction? Think about it: all other numbers get to be divisors. All of them, including positive, negative, rational, even irrational and imaginary. Why such an injustice and inequality before the Law of Math?

We have our favorites like π, and prime members (I mean numbers), but zero is the bottom of the barrel, the lowest of the low, a pariah, an outcast, an untouchable when it comes to dividing by. It does not even have a sign in front of it. Well, it’s legal to have, but it’s meaningless.

And that’s not all. Besides not being allowed in a denominator, zeros are literally discriminated against beyond belief. How else could you characterize the fact that zeros are declared as pathological liars as their innocent value is equated to FALSE in logical expressions, while all other more privileged numbers represent TRUE, even the negative and irrational ones!

Extraordinary qualities of zeros

Despite their literal zero value, their informational value and qualities are not less than, and in many cases significantly surpass those of their siblings. In a sense, zero is a proverbial center of the universe, as all the other numbers dislocated around it as planets around the sun. It is not coincidental that zeros are denoted as circles, which makes them forerunners and likely ancestors of the glorified π.

Speaking of π, what is all the buzz around it? It’s irrational. It’s inferior to 0: it takes 2 π’s to just draw a single zero (remember O=2πR?). Besides, zeros are not just well rounded, they are perfectly rounded.

Privacy protection experts and GDPR enthusiasts love zeros. While other small numbers are required to be suppressed in published demographical reports, zeros may be shown prominently and proudly as they disclose no one’s personally identifiable information (PII).

No number rivals zero. Zeros are perfect numerators and equalizers. If you divide zero by any non-zero member of the digital community, the result will always be zero. Always, regardless of the status of that member. And yes, zeros are perfect common denominators, despite being prohibited from that role for centuries.

Zeros are the most digitally neutral and infinitely tolerant creatures. What other number has tolerated for so long such abuse and discrimination!

Enough is enough!

Dividing by zero opens new horizons

Can you imagine what new opportunities will arise if we break that centuries-old tradition and allow dividing by zero? What new horizons will open! What new breakthroughs and discoveries can be made!

With no more prejudice and prohibition of the division by zero, we can prove virtually anything we wish. For example, here is a short 5-step mathematical proof of “4=5”:

1)   4 – 4 = 10 – 10
2)   22 – 22 = 5·(2 – 2)
3)   (2 + 2)·(2 – 2) = 5·(2 – 2) /* here we divide both parts by (2 – 2), that is by 0 */
4)   (2 + 2) = 5
5)   4 = 5

Let’s make the next logical step. If dividing by zero can make any wish a reality, then producing a number of our choosing by dividing a given number by zero scientifically proves that division by zero is not only legitimate, but also feasible and practical.

As you will see below, division by zero is not that easy, but with the power of SAS, the power to know and the powers of curiosity, imagination and perseverance nothing is impossible.

Division by zero - SAS implementation

Consider the following use case. Say you think of a “secret” number, write it on a piece of paper and put in a “secret” box. Now, you take any number and divide it by zero. If the produced result – the quotient – is equal to your secret number, wouldn’t it effectively demonstrate the practicality and magic power of dividing by zero?

Here is how you can do it in SAS. A relatively “simple”, yet powerful SAS macro %DIV_BY_0 takes a single number as a numerator parameter, divides it by zero and returns the result equal to the one that is “hidden” in your “secret” box. It is the ultimate, pure artificial intelligence, beyond your wildest imagination.

All you need to do is to run this code:

 
data MY_SECRET_BOX;        /* you can use any dataset name here */
   MY_SECRET_NUMBER = 777; /* you can use any variable name here and assign any number to it */
run;
 
%macro DIV_BY_0(numerator);
 
   %if %sysevalf(&numerator=0) %then %do; %put 0:0=1; %return; %end;
   %else %let putn=&sysmacroname; 
   %let %sysfunc(putn(%substr(&putn,%length(&putn)),words.))=
   %sysevalf((&numerator/%sysfunc(constant(pi)))**&sysrc);  
   %let a=com; %let null=; %let nu11=%length(null); 
   %let com=*= This is going to be an awesome blast! ;
   %let %substr(&a,&zero,&zero)=*Close your eyes and open your mind, then;
   %let imagine = "large number like 71698486658278467069846772 Bytes divided by 0";
   %let O=%scan(%quote(&c),&zero+&nu11); 
   %let l=%scan(%quote(&c),&zero);
   %let _=%substr(%scan(&imagine,&zero+&nu11),&zero,&nu11);
   %let %substr(&a,&zero,&zero)%scan(&&&a,&nu11+&nu11-&zero)=%scan(&&&a,-&zero,!b)_;
   %do i=&zero %to %length(%scan(&imagine,&nu11)) %by &zero+&zero;
   %let null=&null%sysfunc(&_(%substr(%scan(&imagine,&nu11),&i,&zero+&zero))); %end;
   %if &zero %then %let _0=%scan(&null,&zero+&zero); %else;
   %if &nu11 %then %let _O=%scan(&null,&zero);
   %if %qsysfunc(&O(_&can)) %then %if %sysfunc(&_0(&zero)) %then %put; %else %put;
   %put &numerator:0=%sysfunc(&_O(&zero,&zero));
   %if %sysfunc(&l(&zero)) %then;
 
%mend DIV_BY_0;
 
%DIV_BY_0(55); /* parameter may be of any numeric value */

When you run this code, it will produce in the SAS LOG your secret number:

55:0=777

How is that possible without the magic of dividing by zero? Note that the %DIV_BY_0 macro has no knowledge of your dataset name, nor the variable name holding your secret number value to say nothing about your secret number itself.

That essentially proves that dividing by zero can practically solve any imaginary problem and make any wish or dream come true. Don’t you agree?

There is one limitation though. We had to make this sacrifice for the sake of numeric social justice. If you invoke the macro with the parameter of 0 value, it will return 0:0=1 – not your secret number - to make it consistent with the rest of non-zero numbers (no more exceptions!): “any number, except zero, divided by itself is 1”.

Challenge

Can you crack this code and explain how it does it? I encourage you to check it out and make sure it works as intended. Please share your thoughts and emotions in the Comments section below.

Disclosure

This SAS code contains no cookies, no artificial sweeteners, no saturated fats, no psychotropic drugs, no illicit substances or other ingredients detrimental to your health and integrity, and no political or religious statements. It does not collect, distribute or sell your personal data, in full compliance with FERPA, HIPPA, GDPR and other privacy laws and regulations. It is provided “as is” without warranty and is free to use on any legal SAS installation. The whole purpose of this blog post and the accompanied SAS programming implementation is to entertain you while highlighting the power of SAS and human intelligence, and to fool around in the spirit of the date of this publication.

Dividing by zero with SAS was published on SAS Users.

3月 072019
 

As of December 2018, any customer with a valid SAS Viya order is able to package and deploy their SAS Viya software in Docker containers. SAS has provided a fully documented and supported project (or “recipe”) for easily building these containers. So how can you start? You can simply stop reading this article and go directly to the GitHub repository and follow the instructions there. Otherwise, in this article, Jeff Owens, a solutions architect at SAS, provides a little color commentary around the process in case it is helpful…

First of all, what is the point of these containers?

Well, at its core, remember SAS and it’s massively parallel, in-memory counterpart, Cloud Analytic Services (CAS) is a powerful runtime for data processing and analytics. A runtime simply being an engine responsible for processing and executing a particular type of code (i.e. SAS code). Traditionally, the SAS runtime would live on a centralized server somewhere and users would submit their “jobs” to that SAS runtime (server) in a variety of ways. The SAS server supports a number of different products, tasks, etc. – but for this discussion let’s just focus on the scenario where a job here is a “.sas” file, perhaps developed in an IDE-like Enterprise Guide or SAS Studio, and submitted to the SAS runtime engine via the IDE itself, a bash shell, or maybe even SAS’ enterprise grade scheduler and job management solution – SAS Grid. In these cases, the SAS and CAS servers are on dedicated, always-on physical servers.

The brave new containerized world in which we live provides us a new deployment model: submit the job and create the runtime server at the same time. Plus, only consume the exact resources from the host machine or the Kubernetes cluster the specific job requires. And when the job finishes, release those resources for others to use. Kubernetes and PaaS clusters are quite likely shared environments, and one of the major themes in the rise of the containers is the further abstraction between hardware and software. Some of that may be easier said than done, particularly for customers with very large volumes of jobs to manage, but it is indeed possible today with SAS Viya on Docker/Kubernetes.

Another effective (and more immediate) usage of this containerized version of SAS Viya is simply an adhoc, on-demand, temporary development environment. The container package includes SAS Studio, so one can quickly spin up a full SAS Viya programming sandbox – SAS Studio as well as the SAS & CAS runtimes. Here they can develop and test SAS code, and just as quickly tear the environment down when no longer needed. This is useful for users that: (a) don’t have access to an “always-on” environment for whatever reason, (b) want to try out experimental code that could potentially consume resources from a shared "always-on" sas environment, and/or (c) maybe their Kubernetes cluster has many more resources available than their always-on and they want to try a BIG job.

Yes, it is possible to deploy the entire SAS Viya stack (microservices and all) via Kubernetes but that discussion is for another day. This post focuses strictly on the SAS Viya programming components and running on a single machine Docker host rather than a Kubernetes cluster.

Build the container image

I will begin here with a fresh single machine RHEL 7.5 server running on Openstack. But this machine could have been running on any cloud or VM platform, and I could use any (modern enough) flavor of Linux thanks to how Docker works. My machine here has 8cpu, 16GB RAM, and a 50GB root volume. Less or more is fine. A couple of notes to help understand how to configure an instance:

  • The final docker container image we will end up with will be ~10GB in size and like all docker images will live in /var/lib/docker/images by default.
    • Yes, that is large for a container. Most of this size is just static bins and libs that support the very developed SAS language. Compare to an Anaconda image which is ~3.6GB.
  • As for RAM, remember any tables loaded to CAS are loaded to memory (and will swap to disk as needed). So, your memory choice should be directly dependent on the data sizes you expect to work with.
  • Similar story for cores – CAS code is multithreaded, so more cores = more parallelization.

The first step is to install Docker.

Following along with sas-container-recipes now, the first thing I should do is mirror the repo for my order. Note, this is not a required step – you could build this container directly from SAS repos if you wanted, but we’ll mirror as a best practice. We could simply mirror and serve it over the local filesystem of our build host, but since I promised color I’ll serve it over the web instead. So, these commands run on a separate RHEL server. If you choose to mirror on your build host, make sure you have the disk space (~30GB should be plenty). You will also need your SAS_Viya_deployment_data.zip file available on the SAS Customer Support site. Run the following code to execute the setup.

$ wget https://support.sas.com/installation/viya/34/sas-mirror-manager/lax/mirrormgr-linux.tgz
$ tar xf mirrormgr-linux.tgz
$ rm -f mirrormgr-linux.tgz
$ mkdir -p /repos/viyactr
$ mirrormgr mirror –deployment-data SAS_Viya_deployment_data.zip –path /repos/viyactr –platform x64-redhat-linux-6 –latest
$ yum install httpd -y
$ system start httpd
$ systemctl enable httpd
$ ln -s /repos/viyactr /var/www/html/sas_repo

Next, I go ahead and clone the sas-containers-recipes repo locally and upload my SAS-Viya-deployment-data.zip file and I am ready to run the build command. As a bonus, I am also going to use my site’s (SAS’) sssd.conf file so my container will use our corporate Active Directory for authentication. If you do not need or want that integration you can skip the “vi addons/sssd.conf” line and change the “--addons” option to “addons/auth-demo” so your container seeds with a single “sasdemo:sasdemo” user:password instead.

$ # upload SAS_Viya_deployment_data.zip to this machine somehow
$ Git clone https://github.com/sassoftware/sas-container-recipes.git
$ cd sas-container-recipes/
$ vi addons/sssd.conf # <- paste in your site’s sssd.conf file
$ build.sh \
--type single \
--zip ~/SAS_Viya_deployment_data.zip \
--mirror-url http://jo.openstack.sas.com/sas_repo \
--addons “addons/auth-sssd”

The build should take about 45 minutes and produce a single container image for you (there might be a few images, but it is just one with a thin layer or two on top). You might want to give this image a new name (docker tag) and push it into your own private registry (docker push). Aside from that, we are ready to run it.
If you are curious, look in the addons directory for the other optional layers you can add to your container. Several tools are available for easily configuring connections to external databases.

Run the container

Here is the run command we can use to launch the container. Note the image name I use here is “sas-viya-programming:xxxxxx” – this is the image that has my sssd layer built on top of it.

$ docker run \
--detach \ 
--rm \ 
--env CASENV_CAS_HOST=$(hostname -f) \ 
--env CASENV_CAS_VIRTUAL_PORT=8081 \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Connect to the container

And now, in a web browser, I can go to :8081/SASStudio and I will end up in SAS Studio, where I can sign in with my internal SAS credentials. To stop the container, use the name you gave it: “docker stop sas-viya-programming”. Because we used the “--rm” flag the container will be removed (completely destroyed) when we stop it.

Note we are explicitly mapping in the HTTP port (8081:80) so we easily know how to get to SAS Studio. If you want to start up another container here on the same host, you will need to use a different port or else you’ll get an address already in use error. Also note we might be interested in connecting directly to this CAS server from something other than SAS Studio (localhost). A remote python client for example. We can use the other port we mapped in (5570:5570) to connect to the CAS server.

Persist the data

Running this container with the above command means anything and everything done inside the container (configuration changes, code, data) will not persist if the container stops and a new one started later. Luckily this is a very standard and easy to solve scenario with Docker and Kubernetes. Here are a couple of targets inside the container you might be interested in mounting a volume to:

  • /tmp – this is where CAS_DISK_CACHE is by default, not to mention SASWORK. Those are scratch space used by the runtimes. If you are working with small data and don’t care too much about performance, no need to worry about this. But to optimize your container we would suggest mounting a Docker volume to this location (or, ideally, bind mount a high-performance storage device here). Note that generally Docker prefers us to use Docker volumes in lieu of bind mounts, but that is more for manageability, security, and portability than performance.
  • /data – this directory doesn’t necessarily exist but when you mount a volume into a container the target location will be created. So, you could call this target whatever you want, assuming it doesn’t exist yet.  Bind mounting is tempting here and OK to do but consider the scenario when another user wants to run your container following instructions you provided them – better to use a Docker volume than force them to create the directory on the host.  If you have an NFS location, bind mounting that makes sense
  • /code – same spiel as with /data. Once you are in the container you can save your work and it will persist in the docker volume from run to run of your container.

Here is what an updated docker run command might look like with these volumes included:

$docker run \ 
--detach \ 
-rm \ 
--env CASNV_CAS_VIRTUAL_HOST=$(hostname -f) \ 
--env CASNV_CAS_VIRTUAL_PORT=8081 \ 
--volume mydata:/data \ 
--volume /nfsdata:/nfsdata \ # example syntax for bind mount instead of docker volume mount 
--volume mycode:/code \ 
--volume sastmp:/tmp \ 
--publish 5570:5570 \ 
--publish 8081:80 \ 
--name sas-viya-programming \ 
--hostname sas-viya-programming \ 
sas-viya-programming:xxxxxx

Can I run this on my laptop?

Yes. You would just need to install Docker on your laptop (go to docker.com for that). You can certainly follow the instructions from the top to build and run locally. You can even push this container image out to an internal registry so other users could skip the build and just run.

So far, we have only talked about the “ad-hoc” or “sandbox” dev type of use case for this container. A later article may cover how to run in batch mode or maybe we will move straight to multi-containers & Kubernetes. In the meantime though, here is how to submit a .sas program as a batch job to this single container we have built.

Give it a try!

Try creating your own image and deploying a container. Feel free to comment on your experience.

More info:

SAS Communities Article- Running SAS Analytics in a Docker container
SAS Global Forum Paper- Docker Toolkit for Data Scientists – How to Start Doing Data Science in Minutes!
SAS Global Forum Tech Talk Video- Deploying and running SAS in Containers

Getting Started with SAS Containers was published on SAS Users.