sas books

5月 232019
 

Often, the SAS 9.4 administration environment architecture can seem confusing to new administrators. You may be faced with questions like: What is a tier? Why are there so many servers? What is the difference between distributed and non-distributed installations?

Understanding SAS 9.4 architecture is key to tackling the tasks and responsibilities that come with SAS administration and will help you know where to look to make changes or troubleshoot problems. One of the ways I have come to think about SAS 9.4 architecture is to think of it like building a house.

So, what is the first thing you need to build a house? Besides money and a Home Depot rewards credit card, land is the first thing you need to put the house on. For SAS administration the land is your infrastructure and hardware, and the house you want to build on that land is your SAS software. You, the admin, are the architect. Sometimes building a house can be simple, so only one architect is needed. Other times, for more complex buildings, an entire team of architects is needed to keep things running smoothly.

Once the architect decides how the house should look and function, and the plans are signed off, the foundation is laid. In our analogy, this foundation is the SAS metadata server – the rest of the installation sits on top of it.

Next come the walls and ceilings for either a single-story ranch house (a distributed SAS environment) or a multi-story house (a non-distributed SAS environment). Once the walls are painted, the plumbing installed, and the carpets laid, you have a house made up of different rooms. Each room has a task: a kitchen to make food, a child’s bedroom to sleep in, and a living room to relax and be with family. Each floor and each room serve the same purpose as a SAS server – each server is dedicated to a specific task and has a specific purpose.
Finally, all of the items in each room, such as the bed, toys, and kitchen utensils can be equated to a data source: like a SAS data set, data pulled in from Hadoop or an Excel spreadsheet. Knowing what is in each room helps you find objects by knowing where they should belong.

Once you move into a house, though, the work doesn’t stop there, and the same is true for a SAS installation. Just like the upkeep on a house (painting the exterior, fixing appliances when they break, etc.), SAS administration requires maintenance to keep everything running smoothly.

How this relates to SAS

To pull this analogy back to SAS, let us start with the different install flavors (single house versus townhouse, single story versus multiple stories). SAS can be installed either as a SAS Foundation install or as a metadata-managed install. A SAS Foundation install is the most basic (think Base SAS). A metadata-managed install is the SAS 9 Intelligence Platform, with many more features and functionality than Base SAS. With SAS Foundation, your users work on their personal machines or use Remote Desktop or Citrix. A SAS Foundation install does not involve a centrally metadata managed system, however in a metadata managed install, your users work on the dedicated SAS server. These two different SAS deployments can be installed on physical or virtual machines, and all SAS solution administration is based off of SAS 9.4 platform administration.

We hope you find this overview of SAS platform administration helpful. For more information check out this list of links to additional admin resources from my new book, SAS® Administration from the Ground Up: Running the SAS®9 Platform in a Metadata Server Environment.

SAS 9.4 architecture – building an installation from the ground up was published on SAS Users.

5月 172019
 

As a publishing house inside of SAS, we often hear: “Does anyone want to read books anymore?” Especially technical programmers who are “too busy” to read. About a quarter of American adults (24%) say they haven’t read a book in whole or in part in the past year, whether in print, electronic or audio form. In addition, leisure reading is at an all-time low in the US. However, we know that as literacy expansion throughout the world has grown, it has also helped reduce inequalities across and within countries. Over the years many articles have been published about how books will soon become endangered species, but can we let that happen when we know the important role books play in education?

At SAS, curiosity and life-long learning are part of our culture. All employees are encouraged to grow their skill set and never stop learning! While different people do have different preferred learning styles, statistics show that reading is critical to the development of life-long learners, something we agree with at SAS Press:

  • In a study completed at Yale University, Researchers studied 3,635 people older than 50 and found that those who read books for 30 minutes daily lived an average of 23 months longer than nonreaders or magazine readers. The study stated that the practice of reading books creates a cognitive engagement that improves a host of different things including vocabulary, cognitive skills, and concentration. Reading can also affect empathy, social perception, and emotional intelligence, which all help people stay on the planet longer.
  • Vocabulary is notoriously resistant to aging, and having a vast one, according to researchers from Spain’s University of Santiago de Compostela, can significantly delay the manifestation of mental decline. When a research team at the university analyzed vocabulary test scores of more than 300 volunteers ages 50 and older, they found that participants with the lowest scores were between three and four times more at risk of cognitive decay than participants with the highest scores.
  • One international study of long-term economic trends among nations found that, along with math and science, “reading performance is strongly and significantly related to economic growth.”

Putting life-long learning into practice

Knowing the importance that reading plays, not only in adult life-long learning with books, SAS has been working hard to improve reading proficiency in young learners — which often ties directly to the number of books in the home, the number of times parents read to young learners, and the amount adults around them read themselves.

High-quality Pre-K lays the foundation for third-grade reading proficiency which is critical to future success in a knowledge-driven economy. — Dr. Jim Goodnight

With all the research pointing to why reading is so important to improving your vocabulary and mental fortitude, it seems only telling that learning SAS through our example-driven, in-depth books would prove natural.

So to celebrate #endangeredspecies day and help save what some call an “endangered species,” let’s think about:

  • What SAS books have you promised yourself you would read this year?
  • What SAS books will you read to continue your journey as a life-long learner?
  • What book do you think will get you to the next level of your SAS journey?

Let us know in the comments, what SAS book improved your love of SAS and took you on a life-long learner journey?

For almost thirty years SAS Press has published books by SAS users for SAS users. Want to find out more about SAS Press? For more about our books and some more of our SAS Press fun, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also, check out all our new SAS books at our online bookstore.

Other Resources:
About SAS: Education Outreach
About SAS: Reading Proficiency
Poor reading skills stymie children and the N.C. economy by Dr. Jim Goodnight

Do books count as endangered species? was published on SAS Users.

5月 142019
 

Interested in making business decisions with big data analytics? Our Wiley SAS Business Series book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value by Bart Baesens, Wouter Verbeke, and Cristian Danilo Bravo Roman has just the information you need to learn how to use SAS to make data and analytics decision-making a part of your core business model!

This book combines the authorial team’s worldwide consulting experience and high-quality research to open up a road map to handling data, optimizing data analytics for specific companies, and continuously evaluating and improving the entire process.

In the following excerpt from their book, the authors describe a value-centric strategy for using analytics to heighten the accuracy of your enterprise decisions:

“'Data is the new oil' is a popular quote pinpointing the increasing value of data and — to our liking — accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use.”

Analytics process model

In our book, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient.

So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.

Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about.

Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to — and analysis of — information to improve and optimize decisions and performance.

This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.

    Step 1
    As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.

    Step 2
    Next, all source data that could be of potential interest need to be identified. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using, for instance, OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing).

    Step 3
    After we move to the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist.

    Step 4
    Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model is interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities!

    Step 5
    Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and back-tested on an ongoing basis.

Book giveaway!

If you are as excited about business analytics as we are and want a copy of Bart Baesens’ book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value, enter to win a free copy in our book giveaway today! The first 5 commenters to correctly answer the question below get a free copy of Baesens book! Winners will be contacted via email.

Here's the question:
What Free SAS Press e-book did Bart Baesens write the foreword too?

We look forward to your answers!

Further resources

Want to prove your business analytics skills to the world? Check out our Statistical Business Analyst Using SAS 9 certification guide by Joni Shreve and Donna Dea Holland! This certification is designed for SAS professionals who use SAS/STAT software to conduct and interpret complex statistical data analysis.

For more information about the certification and certification prep guide, watch this video from co-author Joni Shreve on their SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9.

Big data in business analytics: Talking about the analytics process model was published on SAS Users.

5月 142019
 

Interested in making business decisions with big data analytics? Our Wiley SAS Business Series book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value by Bart Baesens, Wouter Verbeke, and Cristian Danilo Bravo Roman has just the information you need to learn how to use SAS to make data and analytics decision-making a part of your core business model!

This book combines the authorial team’s worldwide consulting experience and high-quality research to open up a road map to handling data, optimizing data analytics for specific companies, and continuously evaluating and improving the entire process.

In the following excerpt from their book, the authors describe a value-centric strategy for using analytics to heighten the accuracy of your enterprise decisions:

“'Data is the new oil' is a popular quote pinpointing the increasing value of data and — to our liking — accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use.”

Analytics process model

In our book, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient.

So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.

Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about.

Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to — and analysis of — information to improve and optimize decisions and performance.

This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.

    Step 1
    As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.

    Step 2
    Next, all source data that could be of potential interest need to be identified. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using, for instance, OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing).

    Step 3
    After we move to the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist.

    Step 4
    Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model is interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities!

    Step 5
    Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and back-tested on an ongoing basis.

Book giveaway!

If you are as excited about business analytics as we are and want a copy of Bart Baesens’ book Profit Driven Business Analytics: A Practitioner’s Guide to Transforming Big Data into Added Value, enter to win a free copy in our book giveaway today! The first 5 commenters to correctly answer the question below get a free copy of Baesens book! Winners will be contacted via email.

Here's the question:
What Free SAS Press e-book did Bart Baesens write the foreword too?

We look forward to your answers!

Further resources

Want to prove your business analytics skills to the world? Check out our Statistical Business Analyst Using SAS 9 certification guide by Joni Shreve and Donna Dea Holland! This certification is designed for SAS professionals who use SAS/STAT software to conduct and interpret complex statistical data analysis.

For more information about the certification and certification prep guide, watch this video from co-author Joni Shreve on their SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9.

Big data in business analytics: Talking about the analytics process model was published on SAS Users.

5月 102019
 

May 12th is #NationalLimerickDay! If you saw our Valentine’s Day poem, you know we at SAS Press love creating poems and fun rhymes, so check out our limericks below!

So, what’s a limerick?

National Limerick Day is observed each year on May 12th and honors the birthday of the famed English artist, illustrator, author and poet Edward Lear (May 12, 1812 – Jan. 29, 1888). Lear’s poetry is most famous for its nonsense or absurdity, and mostly consists of prose and limericks.

His book, “Book of Nonsense,” published in 1846 popularized the limerick poem.

A limerick poem has five lines and is often very short, humorous, and full of nonsense. To create a limerick the first two lines must rhyme with the fifth line, and the third and fourth lines rhyme together. The limerick’s rhythm is officially described as anapestic meter.

To celebrate, we want to ask all lovers of SAS books to enjoy the limericks written by us and to see if you can create your own! Can you top our limericks on our love for SAS Books? Check out our handy how-to limerick links below.

Our limericks

There once was a software named SAS
helping tons of analysts complete tasks.
a Text Analytics book to extract meaning as data flies by
and a Portfolio and Investment Analysis book so you’ll never go awry.
You know our SAS books are first-class!

We enjoyed meeting our awesome users at SAS Global Forum
who enjoy our books with true decorum.
a SAS Administration book on building from the ground up
and a new book about PROC SQL you need to pick-up.
Checkout our SAS books today, you’ll adore ‘em!

For more about SAS Books and some more of our SAS Press fun, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also check out all our new SAS books at our online bookstore.

Resources:
Wiki-How: How to Write A Limerick
Limerick Generator: Create a Limerick in Seconds

Happy National Limerick Day from SAS Press! was published on SAS Users.

5月 102019
 

May 12th is #NationalLimerickDay! If you saw our Valentine’s Day poem, you know we at SAS Press love creating poems and fun rhymes, so check out our limericks below!

So, what’s a limerick?

National Limerick Day is observed each year on May 12th and honors the birthday of the famed English artist, illustrator, author and poet Edward Lear (May 12, 1812 – Jan. 29, 1888). Lear’s poetry is most famous for its nonsense or absurdity, and mostly consists of prose and limericks.

His book, “Book of Nonsense,” published in 1846 popularized the limerick poem.

A limerick poem has five lines and is often very short, humorous, and full of nonsense. To create a limerick the first two lines must rhyme with the fifth line, and the third and fourth lines rhyme together. The limerick’s rhythm is officially described as anapestic meter.

To celebrate, we want to ask all lovers of SAS books to enjoy the limericks written by us and to see if you can create your own! Can you top our limericks on our love for SAS Books? Check out our handy how-to limerick links below.

Our limericks

There once was a software named SAS
helping tons of analysts complete tasks.
a Text Analytics book to extract meaning as data flies by
and a Portfolio and Investment Analysis book so you’ll never go awry.
You know our SAS books are first-class!

We enjoyed meeting our awesome users at SAS Global Forum
who enjoy our books with true decorum.
a SAS Administration book on building from the ground up
and a new book about PROC SQL you need to pick-up.
Checkout our SAS books today, you’ll adore ‘em!

For more about SAS Books and some more of our SAS Press fun, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also check out all our new SAS books at our online bookstore.

Resources:
Wiki-How: How to Write A Limerick
Limerick Generator: Create a Limerick in Seconds

Happy National Limerick Day from SAS Press! was published on SAS Users.

4月 032019
 

Structuring a highly unstructured data source

Human language is astoundingly complex and diverse. We express ourselves in infinite ways. It can be very difficult to model and extract meaning from both written and spoken language. Usually the most meaningful analysis uses a number of techniques.

While supervised and unsupervised learning, and specifically deep learning, are widely used for modeling human language, there’s also a need for syntactic and semantic understanding and domain expertise. Natural Language Processing (NLP) is important because it can help to resolve ambiguity and add useful numeric structure to the data for many downstream applications, such as speech recognition or text analytics. Machine learning runs outputs from NLP through data mining and machine learning algorithms to automatically extract key features and relational concepts. Human input from linguistic rules adds to the process, enabling contextual comprehension.

Text analytics provides structure to unstructured data so it can be easily analyzed. In this blog, I would like to focus on two widely used text analytics techniques: information extraction and entity resolution.

Information Extraction

Information Extraction (IE) automatically extracts structured information from an unstructured or semi-structured text data type -- for example, a text file, to create new structured text data. IE works at the sub-document level, in contrast with techniques such as categorization, that work at the document or record level. Therefore, the results of IE can further feed into other analyses, like predictive modeling or topic identification, as features for those processes. IE can also be used to create a new database of information. One example is the recording of key information about terrorist attacks from a group of news articles on terrorism. Any given IE task has a defined template, which is a (or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to “understand” an attack article only enough to find data corresponding to the slots in this template. Such a database can then be used and analyzed through queries and reports about the data.

In their new book, SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, authors Teresa Jade, Biljana Belamaric Wilsey, and Michael Wallis, give some great examples of uses of IE:

"One good use case for IE is for creating a faceted search system. Faceted search allows users to narrow down search results by classifying results by using multiple dimensions, called facets, simultaneously. For example, faceted search may be used when analysts try to determine why and where immigrants may perish. The analysts might want to correlate geographical information with information that describes the causes of the deaths in order to determine what actions to take."

Another good example of using IE in predictive models is analysts at a bank who want to determine why customers close their accounts. They have an active churn model that works fairly well at identifying potential churn, but less well at determining what causes the churn. An IE model could be built to identify different bank policies and offerings, and then track mentions of each during any customer interaction. If a particular policy could be linked to certain churn behavior, then the policy could be modified to reduce the number of lost customers.

Reporting information found as a result of IE can provide deeper insight into trends and uncover details that were buried in the unstructured data. An example of this is an analysis of call center notes at an appliance manufacturing company. The results of IE show a pattern of customer-initiated calls about repairs and breakdowns of a type of refrigerator, and the results highlight particular problems with the doors. This information shows up as a pattern of increasing calls. Because the content of the calls is being analyzed, the company can return to its design team, which can find and remedy the root problem.

Entity Resolution and regular expressions

Entity Resolution is the technique of recognizing when two observations relate to the same entity (thing, person, company), despite having been described differently. And conversely, recognizing when two observations do not relate to the same entity, despite having been described similarly. For example, you are listed in one data base as S Roberts, Sian Roberts, S.Roberts. All refer to the same person but would be treated as different people in an analysis unless they are resolved (combined to one person).

Entity resolution can be performed as part of a data pre-processing step or as text analysis. Basically one helps resolve multiple entries (cleans the data) and the other resolves reference to a single entity to extract meaning, for example, pronoun resolution - when “it” refers to a particular company mentioned earlier in the text. Here is another example:

Assume each numbered item is a separate observation in the input data set:
1. SAS Institute is a great company. Our company has a recreation center and health care center for employees.
2. Our company has won many awards.
3. SAS Institute was founded in 1976.

The scoring output matches are below; note that the document ID associated with each match aligns with the number before the input document where the match was found.

Unstructured data clean-up

In the following section we focus on the pre-processing clean-up of the data. Unstructured data is the most voluminous form of data in the world, and analysts rarely receive it in perfect condition for processing. In other words, textual data needs to be cleaned, transformed, and enhanced before value can be derived from it.

A regular expression is a pattern that the regular expression engine attempts to match in input. In SAS programming, regular expressions are seen as strings of letters and special characters that are recognized by certain built-in SAS functions for the purpose of searching and matching. Combined with other built-in SAS functions and procedures, such as entity resolution, you can realize tremendous capabilities. Matthew Windham, author of Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS®, gives some great examples of how you might use these techniques to clean your text data in his book. Here we share one of them:

"As you are probably familiar with, data is rarely provided to analysts in a form that is immediately useful. It is frequently necessary to clean, transform, and enhance source data before it can be used—especially textual data."

Extract, Transform, and Load (ETL) ETL is a general set of processes for extracting data from its source, modifying it to fit your end needs, and loading it into a target location that enables you to best use it (e.g., database, data store, data warehouse). We’re going to begin with a fairly basic example to get us started. Suppose we already have a SAS data set of customer addresses that contains some data quality issues. The method of recording the data is unknown to us, but visual inspection has revealed numerous occurrences of duplicative records. In this example, it is clearly the same individual with slightly different representations of the address and encoding for gender. But how do we fix such problems automatically for all of the records?

First Name Last Name DOB Gender Street City State Zip Robert Smith 2/5/1967 M 123 Fourth Street Fairfax, VA 22030 Robert Smith 2/5/1967 Male 123 Fourth St. Fairfax va 22030

Using regular expressions, we can algorithmically standardize abbreviations, remove punctuation, and do much more to ensure that each record is directly comparable. In this case, regular expressions enable us to perform more effective record keeping, which ultimately impacts downstream analysis and reporting. We can easily leverage regular expressions to ensure that each record adheres to institutional standards. We can make each occurrence of Gender either “M/F” or “Male/Female,” make every instance of the Street variable use “Street” or “St.” in the address line, make each City variable include or exclude the comma, and abbreviate State as either all caps or all lowercase. This example is quite simple, but it reveals the power of applying some basic data standardization techniques to data sets. By enforcing these standards across the entire data set, we are then able to properly identify duplicative references within the data set. In addition to making our analysis and reporting less error-prone, we can reduce data storage space and duplicative business activities associated with each record (for example, fewer customer catalogs will be mailed out, thus saving money).

Your unstructured text data is growing daily, and data without analytics is opportunity yet to be realized. Discover the value in your data with text analytics capabilities from SAS. The SAS Platform fosters collaboration by providing a toolbox where best practice pipelines and methods can be shared. SAS also seamlessly integrates with existing systems and open source technology.

Further Resources:
Natural Language Processing: What it is and why it matters

White paper: Text Analytics for Executives: What Can Text Analytics Do for Your Organization?

SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, by Teresa Jade, Biljana Belamaric Wilsey, and Michael Wallis

Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS®, by Matthew Windham

Text analytics explained was published on SAS Users.