data preparation

4月 092019

Natural language understanding (NLU) is a subfield of natural language processing (NLP) that enables machine reading comprehension. While both understand human language, NLU goes beyond the structural understanding of language to interpret intent, resolve context and word ambiguity, and even generate human language on its own. NLU is designed for communicating with non-programmers – to understand their intent and act on it. NLU algorithms tackle the extremely complex problem of semantic interpretation – that is, understanding the intended meaning of spoken or written language, with all the subtleties, of human error, such as mispronunciations or fragmented sentences.

How does it work?

After your data has been analyzed by NLP to identify parts of speech, etc., NLU utilizes context to discern meaning of fragmented and run-on sentences to execute intent. For example, imagine a voice command to Siri or Alexa:

Siri / Alexa play me a …. um song by ... um …. oh I don’t know …. that band I like …. the one you played yesterday …. The Beach Boys … no the bass player … Dick something …

What are the chances of Siri / Alexa playing a song by Dick Dale? That’s where NLU comes in.

NLU reduces the human speech (or text) into a structured ontology – a data model comprising of a formal explicit definition of the semantics (meaning) and pragmatics (purpose or goal). The algorithms pull out such things as intent, timing, location and sentiment.

The above example might break down into:

Play song [intent] / yesterday [timing] / Beach Boys [artist] / bass player [artist] / Dick [artist]

By piecing together this information you might just get the song you want!

NLU has many important implications for businesses and consumers alike. Here are some common applications:

    Conversational interfaces – BOTs that can enhance the customer experience and deliver efficiency.
    Virtual assistants – natural language powered, allowing for easy engagement using natural dialogue.
    Call steering – allowing customers to explain, in their own words, why they are calling rather than going through predefined menus.
    Smart listener – allowing users to optimize speech output applications.
    Information summarization – algorithms that can ‘read’ long documents and summarize the meaning and/or sentiment.
    Pre-processing for machine learning (ML) – the information extracted can then be fed into a machine learning recommendation engine or predictive model. For example, NLU and ML are used to sift through novels to predict which would make hit movies at the box office!

Imagine the power of an algorithm that can understand the meaning and nuance of human language in many contexts, from medicine to law to the classroom. As the volumes of unstructured information continue to grow exponentially, we will benefit from computers’ tireless ability to help us make sense of it all.

Further Resources:
Natural Language Processing: What it is and why it matters

White paper: Text Analytics for Executives: What Can Text Analytics Do for Your Organization?

SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, by Teresa Jade, Biljana Belamaric Wilsey, and Michael Wallis

Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS®, by Matthew Windham

So, you’ve figured out NLP but what’s NLU? was published on SAS Users.

7月 102018

Did you know that 80 percent of an analytics life cycle is time spent on data preparation? For many SAS users and administrators, data preparation is what you live and breathe day in and day out.

Your analysis is only as good as your data, and that's why we wanted to shine a light on the importance of data preparation. I reached out to some of our superstar SAS users in the Friends of SAS community (for Canadian SAS customers and partners) for the inside scoop on different kinds of data preparation tasks they deal with on a daily basis.

Meet Kirby Wu, Actuarial Analyst for TD Insurance

At TD Insurance, SAS is used by many different teams for many different functions.

Kirby uses SAS mainly for its data preparation capabilities. This includes joining tables, cleaning the data, summarizations, segmentation and then sharing this ready-to-use data with the appropriate departments. A day in the life of Kirby includes tackling massive data sets containing billions of claim records. He needs powerful software to perform ETL (extract, transform and load) tasks and manage this data, and that’s where SAS comes in.

"SAS is the first step in our job to access good quality data," says Kirby. "Being an actuary, we use SAS to not only pick up data, but to do profiling, tech inquires and, most importantly, for data quality control purposes. Then we present the data to various teams to take advantage of the findings to improve the business."

Many actuaries have some basic SAS skills to understand the data set. Once they output the data, it is shared across departments and teams for others to make use of.

Prior to his work at TD Insurance, Kirby also used SAS for analytics. He ran GLM analysis where he encountered huge data sets. "When data comes out, we want to understand it, as well as performing statistical analysis on it," says Kirby. "SAS largely influences what direction to go in, and what variable we think is good to use."

Kirby left us with four reasons why he prefers SAS for data preparation.

  1. SAS is an enterprise solution, and the application itself is tried and proven.
  2. Working in insurance, there are many security concerns dealing with sensitive data. SAS provides reassurance in terms of data security.
  3. SAS has been serving the market for many years, and its capabilities and reputation are timeless.
  4. SAS offers a drag-and-drop GUI, as well as programming interfaces for users of varying skill levels.

Meet John Lam of CIBC

Since John joined the bank 15 years ago, he has been using SAS for ETL processes. John saves both time and money using Base SAS when he solves complex ETL tasks in his day-to-day work, and is mainly responsible for data preparation.

John accesses data from multiple source systems and transforms it for business consumption. He works on the technical side and passes on the transformed data within the same organization to the business side. The source data typically comes in at the beginning of each month, but the number of files varies month to month.

"SAS is a great tool," shares John. "The development time is a lot less and helps us save a lot of time on many projects."

John also shared with us some past experiences with complex issues where SAS would have come in handy. He once encountered a situation where he needed to calculate the length of time it would take for someone to receive benefits. However, this calculation method is very complicated and varies greatly depending on how the gap is structured.

"When I look back, the process that took us two to three weeks would have only taken us two to three days if we had used SAS," says John. "SAS would have provided a less complex way of figuring out the problem using date functions!"

Meet Horst Wolter, Manager at TD Bank

"My bread and butter is Base SAS," explains Horst. "The bank has data all over the place in multiple platforms and in multiple forms. We encounter a lot of data – from mainframe to Unix to PC, and flat files or mainframe SAS data sets."

Regardless of the platform or data he is dealing with, users always request slices and dices of the data. Horst takes all available data and finds ways of matching and merging different pieces together to create something that is relevant and easy to understand.

The majority of work Horst does is with credit card data. "I check database views that has millions of rows, which includes historical data."

The bank deals with millions of customers over many years, resulting in many records. Needless to say, the sizes of data he deals with are quite large! Accessing, processing and managing this data for business insight is a battle SAS helps Horst fight every day.

Sharing Is Caring!

How are you using SAS? Share in a few sentences in the comments!

About Friends of SAS

If you’re not familiar with Friends of SAS, it is an exclusive online community available only to our Canadian SAS customers and partners to recognize and show our appreciation for their affinity to SAS. Members complete activities called "challenges" and earn points that can be redeemed for rewards. There are opportunities to build powerful connections, gain privileged access to SAS resources and events, and boost your learning and development of SAS, all in a fun environment.

Interested in learning more about Friends of SAS? Feel free to email or with any questions or to get more details.

No typical SAS user: how three professionals prep data using SAS was published on SAS Users.