sas global forum

8月 202018
 

As SAS Global Forum 2019 Conference Chair, I am excited to share our plans for an extraordinary SAS experience in Dallas, Texas, April 28-May 1. SAS has always been my analytic tool of choice and has provided me an amazing journey in my professional career.  Attending many SAS global conferences throughout the years, I've found this is the best venue to learn, contribute and network with other SAS colleagues.

The content will not disappoint – submit your abstract now
What presentation topics will you hear at SAS Global Forum? In 2019, expanded topics will include deep dives into text analytics, business visualization, AI/Machine Learning, SAS administration, open source integration and career development discussions. As always, SAS basic programming techniques using the current SAS software environment will be the underpinning of many sessions.

Do you want to be a part of that great content but never presented at a conference? Now could be the time to learn new skills by participating in the SAS Global Forum Presenter Mentoring Program. Available for first-time presenters, you will collaborate with a seasoned expert to guide you on preparing and delivering a professional presentation. The mentors can also help before you even submit your abstract in the call for content – which is now open. Get help to craft your title and abstract for your submission. The open call is available through October 22.

How to get to Dallas
Did you know there are scholarships and awards to help SAS users attend SAS Global Forum? Our award program has been enhanced for New SAS Professionals. This award provides an opportunity for new SAS users to enhance their basic SAS skills as well as learn about the latest technology in their line of business. Hear what industry experts are doing to solve business issues by sharing real-world evidence cases.

In addition, we're offering an enhanced International Professional Award focused on global engagement and participation. This award is available for those outside of the continental 48 states to share their expertise and industry solutions from around the world.  At the conference, you will have a chance to learn and network with international professionals who work on analytic projects similar to your analytic career.

Don’t miss out on this valuable experience! Submit your abstract for consideration to be a presenter! I look forward to seeing you in Dallas and hearing about your work.

Call for content now open for SAS Global Forum 2019 was published on SAS Users.

8月 132018
 

Data in the cloud makes it easily accessible, and can help businesses run more smoothly. SAS Viya runs its calculations on Cloud Analytics Service (CAS). David Shannon of Amadeus Software spoke at SAS Global Forum 2018 and presented his paper, Come On, Baby, Light my SAS Viya: Programming for CAS. (In addition to being an avid SAS user and partner, David must be an avid Doors fan.) This article summarizes David's overview of how to run SAS programs in SAS Viya and how to use CAS sessions and libraries.

If you're using SAS Viya, you're going to need to know the basics of CAS to be able to perform calculations and use SAS Viya to its potential. SAS 9 programs are compatible with SAS Viya, and will run as-is through the CAS engine.

Using CAS sessions and libraries

Use a CAS statement to kick off a session, then use CAS libraries (caslibs) to store data and resources. To start the session, simply code "cas;" Each CAS session is given its own unique identifier (UUID) that you can use to reconnect to the session.

Handpicked Related VIDEO: SAS programming in the cloud: CASL code

There are a few significant codes that can help you to master CAS operations. Consider these examples, based on a CAS session that David labeled "speedyanalytics":

  • What CAS sessions do I have running?
    cas _all_ list;
  • Get the version and license specifics from the CAS server hosting my session:
    cas speedyanalytics listabout;
  • I want to sign out of SAS Studio for now, so I will disconnect from my CAS session, but return to it later…
    cas speedyanalytics disconnect;
  • ...later in the same or different SAS Studio session, I want to reconnect to the CAS session I started earlier using the UUID I previous grabbed from the macro variable or SAS log:
    cas uuid="&speedyanalytics_uuid";
  • At the end of my program(s), shutdown all my CAS sessions to release resources on the server:
    cas _all_ terminate;

Using CAS libraries

CAS libraries (caslib) are the method to access data that is being stored in memory, as well as the related metadata.

From the library, you can load data into CAS tables in a couple of different ways:

  1. Takes a sample data set, calculate a new measure and stores the output in memory
  2. Proc COPY can bring existing SAS data into a caslib
  3. Proc CASUTIL loads tables into caslibs

The Proc CASUTIL allows you to save your tables (named "classsi" data in David's examples) for future use through the SAVE statement:

proc casutil;
 save casdata="classsi" casout="classsi";
run;

And reload like this in a future session, using the LOAD statement:

proc casutil;
 load casdata="classsi" casout="classsi";
run;

When accessing your CAS libraries, remember that there are multiple levels of scope that can apply. "Session" refers to data from just the current session, whereas "Global" allows you to reach data from all CAS sessions.

Programming in CAS

Showing how to put CAS into action, David shared this diagram of a typical load/save/share flow:

Existing SAS 9 programs and CAS code can both be run in SAS Viya. The calculations and data memory occurs through CAS, the Cloud Analytics Service. Before beginning, it's important to understand a general overview of CAS, to be able to access CAS libraries and your data. For more about CAS architecture, read this paper from CAS developer Jerry Pendergrass.

The performance case for SAS Viya

To close out his paper, David outlined a small experiment he ran to demonstrate performance advantages that can be seen by using SAS Viya v3.3 over a standard, stand-alone SAS v9.4 environment. The test was basic, but performed reads, writes, and analytics on a 5GB table. The tests revealed about a 50 percent increase in performance between CAS and SAS 9 (see the paper for a detailed table of comparison metrics). SAS Viya is engineered for distributive computing (which works especially well in cloud deployments), so more extensive tests could certainly reveal even further increases in performance in many use cases.

Additional resources

A quick introduction to CAS in SAS Viya was published on SAS Users.

8月 022018
 

SAS Viya has opened an entirely new set of capabilities, allowing SAS to analyze on cloud technology in real-time. One of the best new features of SAS Viya is its ability to pair with open source platforms, allowing developers the freedom of language and implementation to integrate with the power of SAS analytics.

At SAS Global Forum 2018, Sean Ankenbruck and Grace Heyne Lybrand from Zencos Consulting led the talk, SAS Viya: The Beauty of REST in Action. While the paper – and this blog post – outlines the use of Python and SAS Viya, note that SAS Viya integrates with R, Java and Lua as well.

Nonetheless, this Python integration example shows how easy it is to integrate SAS Viya and open source technologies. Here is the basic workflow:

1. A developer creates a web application, in a language of their choice
2. A user enters data in the web application
3. The collected data that is passed to Viya via the defined APIs
4. Analysis is performed in Viya using SAS actions
5. Results are passed back to the web application
6. The web application presents the results to the user

About the process

SAS’ Cloud Analytic Services (CAS) acts as a server to analyze data, and REST API’s are being used to integrate many programming languages into SAS Viya. REST stands for Representational State Transfer, and is a set of constraints that allows scalability and integration of multiple web-based systems. In layman’s terms, it’s a set of software design patterns that provides handy connector points from one web app to another. The REST API is what developers use to interact with and submit requests through the processing system.

CAS actions are what allow “tasks” to be completed on SAS Viya. These “tasks” are under the categories of Statistics, Analytics, System, and Data Mining and Machine Learning.

Integration with Python

To access CAS through Python, the SAS Scripting Wrapper for Analytics Transfer (SWAT) package is used, letting Python conventions dictate CAS actions. To create this interface, data must be captured through a web application in a format that Python can transmit to SAS Viya.
In order to connect Python and CAS, the following is necessary:

• Hostname
• CAS Port
• Username
• Password

Let’s see it in action

As an example, one project about wine preferences used CAS-collected data through a questionnaire stored in Python’s Pandas library. When the information was gathered, the decision tree was uploaded to SAS Viya. A model was created with common terms reviewers use to describe wines, feeding into a decision tree. The CAS server scored the users’ responses in real-time, and then sent the results back to the user providing them with suggested wines to match their inputs.

Process for model

Code to utilize tree:

conn.loadactionset("decisionTree") 
conn.decisionTree.dTreeTrain( 
      casOut = {"name":"tree_model"},
      inputs = [{vars}], 
      modelId = "DT_wine_variety", 
      table = {"caslib":"public", "name": "wines_model_data"}, 
      target = "variety")

"Decision" given to user

Conclusion

SAS Viya has opened SAS to a plethora of opportunities, allowing many different programming languages to be interpreted and quickly integrated, giving analysts and data scientists more flexibility.

Additional resources

At Your Service: Using SAS® Viya™ and Python to Create Worker Programs for Real-Time Analytics, Jon Klopfer, Scott Koval, and Mia List
SAS Viya
sas-viya-programming on github
python-swat on github
SAS Global Forum

Additional SAS Viya talks from SAS Global Forum

A Need for Speed: Loading Data via the Cloud, Henry Christoffels
Come On, Baby, Light my SAS® Viya®: Programming for CAS, David Shannon
Just Enough SAS® Cloud Analytic Services: CAS Actions for SAS® Visual Analytics Report Developers, Michael Drutar
Running SAS Viya on Oracle Cloud without Sacrificing Performance, Dan Grant
Command-Line Administration in SAS® Viya®, Danny Hamrick
Five Approaches for High-Performance Data Loading to the SAS® Cloud Analytic Services Server, Rob Collum

How SAS Viya uses REST APIs to integrate with Python was published on SAS Users.

7月 262018
 

SAS Text Analytics analyze documents at document-level by default, but sometimes sentence-level analysis gains further insights into the data. Two years ago, SAS Text Analytics team did some research on sentence-level text analysis and shared their discoveries in a SGF paper Getting More from the Singular Value Decomposition (SVD): Enhance Your Models with Document, Sentence, and Term Representations. Recently my team started working on a concept extraction project. We need to extract all sentences containing one or two query words, so that linguists don't need to read the whole documents in order to write concept extraction rules. This improves their work efficiency on rules development and rule tuning significantly.

Sentence boundary detection

Sentence boundary detection is a challenge in Natural Language Processing -- it's more complicated than you might expect. For example, most sentences in English end with a period, but sometimes a period is used to denote an abbreviation or used as a part of ellipsis. My colleagues Biljana and Teresa wrote an article about the complexities of how a period may be used. if you are interested in this topic, please check out their article Text analytics through linguists' eyes: When is a period not a full stop?

Sentence boundary rules are different for different languages, and when you work with multilingual data you might want to write one set of code to manipulate all data in varied languages. For example, a period in German is used to denote ending of an ordinal number token; in Chinese, the sentence-final period is different from English period; and Thai does not use period to denote the end of a sentence.

Here are several sentence boundary examples:

Sentences Language Text
1 English Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990.
2 English I paid $23.45 for this book.
3 English We earn more and more money, but we feel less and less happier. So…what happened to us?
4 Chinese 北京确实人多车多,但是根源在哪里?
5 Chinese 在于首都集中了太多全国性资源。
6 German Was sind die Konsequenzen der Abstimmung vom 12. Juni?

How to tokenize documents into sentences with SAS?

There are several methods to build a sentence tokenizer with SAS Text Analytics. Here I only list three methods:

  • Method 1: Use CAS action tpParse and SAS Viya
  • Method 3: Use SAS Data Step Code and SAS 9

Among the above three methods, I recommend the first method, because it can extract sentences and keep the raw texts intact. With the second method, uppercase letters are changed into lowercase letters after parsing with SAS, and some unseen characters will be replaced with white spaces. The third method is based on traditional SAS 9 technology (not SAS Viya), so it might not scale to large data as well.

In my article, I show the SAS code of only the first two methods. For details of the SAS code for the last method, please check out the paper Getting More from the Singular Value Decomposition (SVD): Enhance Your Models with Document, Sentence, and Term Representations.

Use CAS action The applyConcept action performs concept extraction using a concept extraction model that you compile and validate.

%macro sentenceTokenizer1(
   dsIn=,
   docVar=,
   textVar=,
   language=,
   dsOut=
);
/* Rule for determining sentence boundaries */
data sascas1.concept_rule;
   length rule $ 200;
   ruleId=1;
   rule='ENABLE:SentBoundaries';
   output;
 
   ruleId=2;
   rule='PREDICATE_RULE:SentBoundaries(first,last):(SENT,"_first{_w}","_last{_w}")';
   output;
run;
 
proc cas;
textRuleDevelop.validateConcept / 
   table={name="concept_rule"}
   config='rule'
   ruleId='ruleId'
   language="&language"
   casOut={name='outValidation',replace=TRUE}
;
run;
quit;
 
/* Compile concept rule; */
proc cas;
textRuleDevelop.compileConcept / 
   table={name="concept_rule"}
   config="rule"
   enablePredefined=false
   language="&language"
   casOut={name="outli", replace=TRUE}
;
run;
quit;
 
/* Get Sentences */
proc cas;
textRuleScore.applyConcept / 
   table={name="&dsIn"}
   docId="&docVar"
   text="&textVar"
   language="&language"
   model={name="outli"}
   matchType="best"
   casOut={name="outpos_eli", replace=TRUE}
   factOut={name="&dsOut", replace=TRUE, where="_fact_argument_=''"}
;
run;
quit;
 
proc cas;
   table.dropTable name="concept_rule" quiet=true; run;
   table.dropTable name="outli" quiet=true; run;
   table.dropTable name="outpos_eli" quiet=true; run;
quit; 
%mend sentenceTokenizer1;

Use CAS action NLP technique called tpParse.

%macro sentenceTokenizer2(
   dsIn=,
   docVar=,
   textVar=,
   language=,
   dsOut=
);
/* Parse the data set */
proc cas;
textparse.tpParse /
   docId="&docVar"
   documents={name="&dsIn"}
   text="&textVar"
   language="&language"
   cellWeight="NONE"
   stemming=false
   tagging=false
   noungroups=false
   entities="none"
   selectAttribute={opType="IGNORE",tagList={}}
   selectPos={opType="IGNORE",tagList={}}
   offset={name="offset",replace=TRUE}
;
run;
 
/* Get Sentences */
proc cas;
table.partition / 
   table={name="offset" 
          groupby={{name="_document_"}, {name="_sentence_"}}
          orderby={{name="_start_"}}
         }
   casout={name="offset" replace=true};
run;
 
datastep.runCode /
code= "
data &dsOut;
   set offset;
   by _document_ _sentence_ _start_;
   length _text_ varchar(20000);
   if first._sentence_ then do;
      _text_='';
      _lag_end_ = -1;
   end;  
   if _start_=_lag_end_+1 then
      _text_=cats(_text_, _term_);
   else
      _text_=trim(_text_)||repeat(' ',_start_-_lag_end_-2)||_term_;
   _lag_end_=_end_;  
   if last._sentence_ then output;
   retain _text_ _lag_end_;
   keep _document_ _sentence_ _text_;
run;
";
run;   
quit;
 
proc cas;
   table.dropTable name="offset" quiet=true; run;
quit; 
%mend sentenceTokenizer2;

Here are three examples for using each of these tokenizer methods:

/*-------------------------------------*/
/* Start CAS Server.                   */
/*-------------------------------------*/
cas casauto host="host.example.com" port=5570;
libname sascas1 cas;
 
/*-------------------------------------*/
/* Example 1: Chinese texts            */
/*-------------------------------------*/
data sascas1.text_zh;
   infile cards dlm='|' missover;
   input _document_ text :$200.;
   cards;
1|北京确实人多车多,但是根源在哪里?在于首都集中了太多全国性资源。
;
run;   
 
%sentenceTokenizer1(
   dsIn=text_zh,
   docVar=_document_,
   textVar=text,
   language=chinese,
   dsOut=sentences_zh1
);
 
%sentenceTokenizer2(
   dsIn=text_zh,
   docVar=_document_,
   textVar=text,
   language=chinese,
   dsOut=sentences_zh2
);
 
/*-------------------------------------*/
/* Example 2: English texts            */
/*-------------------------------------*/
data sascas1.text_en;
   infile cards dlm='|' missover;
   input _document_ text :$500.;
   cards;
1|Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990.
2|I paid $23.45 for this book.
3|We earn more and more money, but we feel less and less happier. So…what happened to us?
;
run;   
 
%sentenceTokenizer1(
   dsIn=text_en,
   docVar=_document_,
   textVar=text,
   language=english,
   dsOut=sentences_en1
);
 
%sentenceTokenizer2(
   dsIn=text_en,
   docVar=_document_,
   textVar=text,
   language=english,
   dsOut=sentences_en2
);
 
 
/*-------------------------------------*/
/* Example 3: German texts             */
/*-------------------------------------*/
data sascas1.text_de;
   infile cards dlm='|' missover;
   input _document_ text :$600.;
   cards;
1|Was sind die Konsequenzen der Abstimmung vom 12. Juni?
;
run;   
 
%sentenceTokenizer1(
   dsIn=text_de,
   docVar=_document_,
   textVar=text,
   language=german,
   dsOut=sentences_de1
);
 
%sentenceTokenizer2(
   dsIn=text_de,
   docVar=_document_,
   textVar=text,
   language=german,
   dsOut=sentences_de2
);

The sentences extracted of the three examples as Table 2 shows below.

Example Doc Text Sentence (Method 1) Sentence (Method 2)
English

 

1 Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990. Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990. rolls-royce motor cars inc. said it expects its u.s. sales to remain steady at about 1,200 cars in 1990.
2 I paid $23.45 for this book. I paid $23.45 for this book. i paid $23.45 for this book.
3 We earn more and more money, but we feel less and less happier. So…what happened to us? We earn more and more money, but we feel less and less happier. we earn more and more money, but we feel less and less happier.
So…what happened? so…what happened?
Chinese

 

1 北京确实人多车多,但是根源在哪里?在于首都集中了太多全国性资源。 北京确实人多车多,但是根源在哪里? 北京确实人多车多,但是根源在哪里?
在于首都集中了太多全国性资源。 在于首都集中了太多全国性资源。
German 1 Was sind die Konsequenzen der Abstimmung vom 12. Juni? Was sind die Konsequenzen der Abstimmung vom 12. Juni? was sind die konsequenzen der abstimmung vom 12. juni?

From the above table, you can see that there is no difference between two methods with Chinese textual data, but many differences between two methods with English or German textual data. So which method you should use? It depends on the SAS products that you have available. Method 1 depends on compileConcept, validateConcept, and applyConcept actions, and requires SAS Visual Text Analytics. Method 2 depends on the tpParse action in SAS Visual Analytics. If you have both products available, then consider your use case. If you are working on text analytics that are case insensitive, such as topic detection or text clustering, you may choose method 2. Otherwise, if the text analytics are case sensitive such as named entity recognition, you must choose method 1. (And of course, if you don't have SAS Viya, you can use method 3 with SAS 9 and guidance from the cited paper.)

If you have SAS Viya, I suggest trying the above sentence tokenization method with your data and then run text mining actions on the sentence-level data to see what insights you will get.

How to tokenize documents into sentences was published on SAS Users.

7月 182018
 

Last year when I went through the SAS Global Forum 2017 paper list, the paper Breaking through the Barriers: Innovative Sampling Techniques for Unstructured Data Analysis impressed me a lot. In this paper, the author raised out the common problems caused by traditional sampling method and proposed four sampling methods for textual data. Recently my team is working on a project in which we are facing a huge volume of documents from a specific field, and we need efforts of linguists and domain experts to analyze the textual data and annotate ground truth, so our first question is which documents we should start working on to get a panoramic image of the data with minimum efforts. Frankly, I don’t have a state-of-the-art method to extract representative documents and measure its effect, so why not try this innovative technique?

The paper proposed four sampling methods, and I only tried the first method through using cluster memberships as a strata. Before we step into details of the SAS program, let me introduce the steps of this method.

  • Step 1: Parse textual data into tokens and calculate each term's TF-IDF value
  • Step 2: Generate term-by-document matrix
  • Step 3: Cluster documents through k-means algorithm
  • Step 4: Get top k terms of each cluster
  • Step 5: Do stratified sampling by cluster

I wrote a SAS macro for each step so that you are able to check the results step by step. If you are not satisfied with the final cluster result, you can tune the parameters of any step and re-run this step and its post steps. Now let's see how to do this using SAS Viya to extract samples from a movie review data.

The movie review data has 11,855 rows of observations, and there are 200,963 tokens. After removing stop words, there are 18,976 terms. In this example, I set dimension size of the term-by-document matrix as 3000. This means that I use the top 3000 terms with the highest TF-IDF values of the document collections as its dimensions. Then I use k-means clustering to group documents into K clusters, and I set the maximum K as 50 with the kClus action in CAS. The dataSegment action can cluster documents directly, but this action cannot choose the best K. You need to try the clustering action with different K values and choose the best K by yourself. Conversely the kClus action chooses the best K automatically among the K values defined by minimum K and maximum K, so I use kClus action in my implementation.

After running the program (full code at the end of this post), I got 39 clusters and top 10 terms of the first cluster as Table-1 shows.

Table-1 Top 10 terms of Cluster 1

Let's see what samples we get for the first cluster. I got 7 documents and each document either has term "predictable" or term "emotional."

Samples from cluster

I set sampPct as 5 which means 5% data will be randomly selected from each cluster. Finally I got 582 sample documents. Let's check the sample distribution of each cluster.

Donut chart of cluster samples

This clustering method helped us select a small part of documents from the piles of document collections intelligently, and most importantly it saved us much time and helped us to hit the mark.

I haven't had a chance to try the other three sampling methods from the paper; I encourage you have a try and share your experiences with us. Big thanks to my colleague Murali Pagolu for sharing this innovative technique during the SAS Global Forum 2017 conference and for kindly providing me with some good suggestions.

Appendix: Complete code for text sampling

 
/*-------------------------------------*/
/* Get tfidf                           */
/*-------------------------------------*/
%macro getTfidf(
   dsIn=, 
   docVar=, 
   textVar=, 
   language=, 
   stemming=true, 
   stopList=, 
   dsOut=
);
proc cas;
textparse.tpParse /
   docId="&docVar"
   documents={name="&dsIn"}
   text="&textVar"
   language="&language"
   cellWeight="NONE"
   stemming=false
   tagging=false
   noungroups=false
   entities="none"
   offset={name="tpparse_out",replace=TRUE}
;
run;
 
textparse.tpAccumulate /
   offset={name="tpparse_out"}
   stopList={name="&stopList"}
   termWeight="NONE"
   cellWeight="NONE"
   reduce=1
   parent={name="tpAccu_parent",replace=TRUE}
   terms={name="tpAccu_term",replace=TRUE}
   showdroppedterms=false
;
run;
quit;
 
proc cas;
loadactionset "fedsql";
execdirect casout={name="doc_term_stat", replace=true} 
query="
      select tpAccu_parent.&docVar, 
             tpAccu_term._term_,
             tpAccu_parent._count_ as _tf_,
             tpAccu_term._NumDocs_
      from tpAccu_parent
      left join tpAccu_term
      on tpAccu_parent._Termnum_=tpAccu_term._Termnum_;
"
;
run;
 
simple.groupBy / 
   table={name="tpAccu_parent"}
   inputs={"&docVar"}
   casout={name="doc_nodup", replace=true};
run;
 
numRows result=r / 
   table={name="doc_nodup"};
totalDocs = r.numrows;
run;
 
datastep.runcode /
code = "
   data &dsOut;
      set doc_term_stat;"
   ||"_tfidf_ = _tf_*log("||totalDocs||"/_NumDocs_);"
   ||"run;
";
run;
quit;
 
proc cas;
   table.dropTable name="tpparse_out" quiet=true; run;
   table.dropTable name="tpAccu_parent" quiet=true; run;
   table.dropTable name="tpAccu_term" quiet=true; run;
   table.dropTable name="doc_nodup" quiet=true; run;
   table.dropTable name="doc_term_stat" quiet=true; run;
quit;
%mend getTfidf;
 
 
/*-------------------------------------*/
/* Term-by-document matrix             */
/*-------------------------------------*/
%macro DocToVectors(
   dsIn=, 
   docVar=, 
   termVar=, 
   tfVar=, 
   dimSize=500, 
   dsOut=
);
proc cas;
simple.summary /
   table={name="&dsIn", groupBy={"&termVar"}}
   inputs={"&tfVar"}
   summarySubset={"sum"}
   casout={name="term_tf_sum", replace=true};
run;
 
simple.topk / 
   table={name="term_tf_sum"}  
   inputs={"&termVar"} 
   topk=&dimSize
   bottomk=0 
   raw=True 
   weight="_Sum_"
   casout={name='termnum_top', replace=true};
run;
 
loadactionset "fedsql";
execdirect casout={name="doc_top_terms", replace=true} 
query="
      select termnum.*, _rank_
      from &dsIn termnum, termnum_top
      where termnum.&termVar=termnum_top._Charvar_
        and &tfVar!=0;
"
;
run;
 
transpose.transpose /
   table={name="doc_top_terms", 
          groupby={"&docVar"}, 
          computedVars={{name="_name_"}},
          computedVarsProgram="_name_='_dim'||strip(_rank_)||'_';"}  
   transpose={"&tfVar"}
   casOut={name="&dsOut", replace=true};
run;
quit;
 
proc cas;
   table.dropTable name="term_tf_sum" quiet=true; run;
   table.dropTable name="termnum_top" quiet=true; run;
   table.dropTable name="termnum_top_misc" quiet=true; run;
   table.dropTable name="doc_top_terms" quiet=true; run;
quit;
%mend DocToVectors;
 
 
/*-------------------------------------*/
/* Cluster documents                   */
/*-------------------------------------*/
%macro clusterDocs(
   dsIn=, 
   nClusters=10,
   seed=12345,   
   dsOut=
);
proc cas;
/*get the vector variables list*/
columninfo result=collist /
   table={name="&dsIn"};
ndimen=dim(collist['columninfo']);
vector_columns={};
j=1;
do i=1 to ndimen;
   thisColumn = collist['columninfo'][i][1];
   if lowcase(substr(thisColumn, 1, 4))='_dim' then do;
      vector_columns[j]= thisColumn;
      j=j+1;
   end;
end;
run;
 
clustering.kClus / 
   table={name="&dsIn"},
   nClusters=&nClusters,
   init="RAND",
   seed=&seed,
   inputs=vector_columns,
   distance="EUCLIDEAN",
   printIter=false,
   impute="MEAN",
   standardize='STD',
   output={casOut={name="&dsOut", replace=true}, copyvars="ALL"}
;
run;
quit;
%mend clusterDocs;
 
 
/*-------------------------------------*/
/* Get top-k words of each cluster     */
/*-------------------------------------*/
%macro clusterProfile(
   termDS=, 
   clusterDS=, 
   docVar=, 
   termVar=, 
   tfVar=, 
   clusterVar=_CLUSTER_ID_, 
   topk=10, 
   dsOut=
);
proc cas;
loadactionset "fedsql";
execdirect casout={name="cluster_terms",replace=true} 
query="
      select &termDS..*, &clusterVar
      from &termDS, &clusterDS
      where &termDS..&docVar = &clusterDS..&docVar;
"
;
run;
 
simple.summary /
   table={name="cluster_terms", groupBy={"&clusterVar", "&termVar"}}
   inputs={"&tfVar"}
   summarySubset={"sum"}
   casout={name="cluster_terms_sum", replace=true};
run;
 
simple.topk / 
   table={name="cluster_terms_sum", groupBy={"&clusterVar"}}  
   inputs={"&termVar"} 
   topk=&topk
   bottomk=0 
   raw=True 
   weight="_Sum_"
   casout={name="&dsOut", replace=true};
run;
quit;
 
proc cas;
   table.dropTable name="cluster_terms" quiet=true; run;
   table.dropTable name="cluster_terms_sum" quiet=true; run;
quit;
%mend clusterProfile;
 
 
/*-------------------------------------*/
/* Stratified sampling by cluster      */
/*-------------------------------------*/
%macro strSampleByCluster(
   docDS=, 
   docClusterDS=, 
   docVar=, 
   clusterVar=_CLUSTER_ID_, 
   seed=12345,   
   sampPct=, 
   dsOut=
);
proc cas;
loadactionset "sampling";
stratified result=r /
   table={name="&docClusterDS", groupby={"&clusterVar"}}
   sampPct=&sampPct 
   partind="TRUE" 
   seed=&seed
   output={casout={name="sampling_out",replace="TRUE"},
                   copyvars={"&docVar", "&clusterVar"}};
run;
print r.STRAFreq; run;
 
loadactionset "fedsql";
execdirect casout={name="&dsOut", replace=true} 
query="
   select docDS.*, &clusterVar
   from &docDS docDS, sampling_out
   where docDS.&docVar=sampling_out.&docVar
     and _PartInd_=1;
"
;
run;
 
proc cas;
   table.dropTable name="sampling_out" quiet=true; run;
quit; 
%mend strSampleByCluster;
 
 
/*-------------------------------------*/
/* Start CAS Server.                   */
/*-------------------------------------*/
cas casauto host="host.example.com" port=5570;
libname sascas1 cas;
 
 
/*-------------------------------------*/
/* Prepare and load data.              */
/*-------------------------------------*/
%let myData=movie_reviews;
 
proc cas;
loadtable result=r / 
   importOptions={fileType="csv", delimiter='TAB',getnames="true"}
   path="data/movie_reviews.txt"
   casLib="CASUSER"
   casout={name="&myData", replace="true"} ;
run;
quit;
 
/* Browse the data */
proc cas;
   columninfo / table={name="&myData"};
   fetch / table = {name="&myData"};
run;
quit;
 
/* generate one unique index using data step */
proc cas;
datastep.runcode /
code = "
   data &myData;
      set &myData;
      rename id = _document_;
      keep id text score;  
   run;
";
run;
quit;
 
/* create stop list*/
data sascas1.stopList;
   set sashelp.engstop;
run;
 
/* Get tfidf by term by document */
%getTfidf(
   dsIn=&myData, 
   docVar=_document_, 
   textVar=text, 
   language=english, 
   stemming=true, 
   stopList=stopList, 
   dsOut=doc_term_tfidf
);
 
/* document-term matrix */
%DocToVectors(
   dsIn=doc_term_tfidf, 
   docVar=_document_, 
   termVar=_term_, 
   tfVar=_tfidf_, 
   dimSize=2500, 
   dsOut=doc_vectors
);
 
/* Cluster documents */
%clusterDocs(
   dsIn=doc_vectors, 
   nClusters=10, 
   seed=12345,   
   dsOut=doc_clusters
);
 
/* Get top-k words of each cluster */
%clusterProfile(
   termDS=doc_term_tfidf, 
   clusterDS=doc_clusters, 
   docVar=_document_, 
   termVar=_term_, 
   tfVar=_tfidf_, 
   clusterVar=_cluster_id_, 
   topk=10, 
   dsOut=cluster_topk_terms
);
/*-------------------------------------------*/
/* Sampling textual data based on clustering */
/*-------------------------------------------*/
 
 
/*-------------------------------------*/
/* Get tfidf                           */
/*-------------------------------------*/
%macro getTfidf(
   dsIn=, 
   docVar=, 
   textVar=, 
   language=, 
   stemming=true, 
   stopList=, 
   dsOut=
);
proc cas;
textparse.tpParse /
   docId="&docVar"
   documents={name="&dsIn"}
   text="&textVar"
   language="&language"
   cellWeight="NONE"
   stemming=false
   tagging=false
   noungroups=false
   entities="none"
   offset={name="tpparse_out",replace=TRUE}
;
run;
 
textparse.tpAccumulate /
   offset={name="tpparse_out"}
   stopList={name="&stopList"}
   termWeight="NONE"
   cellWeight="NONE"
   reduce=1
   parent={name="tpAccu_parent",replace=TRUE}
   terms={name="tpAccu_term",replace=TRUE}
   showdroppedterms=false
;
run;
quit;
 
proc cas;
loadactionset "fedsql";
execdirect casout={name="doc_term_stat", replace=true} 
query="
      select tpAccu_parent.&docVar, 
             tpAccu_term._term_,
             tpAccu_parent._count_ as _tf_,
             tpAccu_term._NumDocs_
      from tpAccu_parent
      left join tpAccu_term
      on tpAccu_parent._Termnum_=tpAccu_term._Termnum_;
"
;
run;
 
simple.groupBy / 
   table={name="tpAccu_parent"}
   inputs={"&docVar"}
   casout={name="doc_nodup", replace=true};
run;
 
numRows result=r / 
   table={name="doc_nodup"};
totalDocs = r.numrows;
run;
 
datastep.runcode /
code = "
   data &dsOut;
      set doc_term_stat;"
   ||"_tfidf_ = _tf_*log("||totalDocs||"/_NumDocs_);"
   ||"run;
";
run;
quit;
 
proc cas;
   table.dropTable name="tpparse_out" quiet=true; run;
   table.dropTable name="tpAccu_parent" quiet=true; run;
   table.dropTable name="tpAccu_term" quiet=true; run;
   table.dropTable name="doc_nodup" quiet=true; run;
   table.dropTable name="doc_term_stat" quiet=true; run;
quit;
%mend getTfidf;
 
 
/*-------------------------------------*/
/* Term-by-document matrix             */
/*-------------------------------------*/
%macro DocToVectors(
   dsIn=, 
   docVar=, 
   termVar=, 
   tfVar=, 
   dimSize=500, 
   dsOut=
);
proc cas;
simple.summary /
   table={name="&dsIn", groupBy={"&termVar"}}
   inputs={"&tfVar"}
   summarySubset={"sum"}
   casout={name="term_tf_sum", replace=true};
run;
 
simple.topk / 
   table={name="term_tf_sum"}  
   inputs={"&termVar"} 
   topk=&dimSize
   bottomk=0 
   raw=True 
   weight="_Sum_"
   casout={name='termnum_top', replace=true};
run;
 
loadactionset "fedsql";
execdirect casout={name="doc_top_terms", replace=true} 
query="
      select termnum.*, _rank_
      from &dsIn termnum, termnum_top
      where termnum.&termVar=termnum_top._Charvar_
        and &tfVar!=0;
"
;
run;
 
transpose.transpose /
   table={name="doc_top_terms", 
          groupby={"&docVar"}, 
          computedVars={{name="_name_"}},
          computedVarsProgram="_name_='_dim'||strip(_rank_)||'_';"}  
   transpose={"&tfVar"}
   casOut={name="&dsOut", replace=true};
run;
quit;
 
proc cas;
   table.dropTable name="term_tf_sum" quiet=true; run;
   table.dropTable name="termnum_top" quiet=true; run;
   table.dropTable name="termnum_top_misc" quiet=true; run;
   table.dropTable name="doc_top_terms" quiet=true; run;
quit;
%mend DocToVectors;
 
 
/*-------------------------------------*/
/* Cluster documents                   */
/*-------------------------------------*/
%macro clusterDocs(
   dsIn=, 
   nClusters=10,
   seed=12345,   
   dsOut=
);
proc cas;
/*get the vector variables list*/
columninfo result=collist /
   table={name="&dsIn"};
ndimen=dim(collist['columninfo']);
vector_columns={};
j=1;
do i=1 to ndimen;
   thisColumn = collist['columninfo'][i][1];
   if lowcase(substr(thisColumn, 1, 4))='_dim' then do;
      vector_columns[j]= thisColumn;
      j=j+1;
   end;
end;
run;
 
clustering.kClus / 
   table={name="&dsIn"},
   nClusters=&nClusters,
   init="RAND",
   seed=&seed,
   inputs=vector_columns,
   distance="EUCLIDEAN",
   printIter=false,
   impute="MEAN",
   standardize='STD',
   output={casOut={name="&dsOut", replace=true}, copyvars="ALL"}
;
run;
quit;
%mend clusterDocs;
 
 
/*-------------------------------------*/
/* Get top-k words of each cluster     */
/*-------------------------------------*/
%macro clusterProfile(
   termDS=, 
   clusterDS=, 
   docVar=, 
   termVar=, 
   tfVar=, 
   clusterVar=_CLUSTER_ID_, 
   topk=10, 
   dsOut=
);
proc cas;
loadactionset "fedsql";
execdirect casout={name="cluster_terms",replace=true} 
query="
      select &termDS..*, &clusterVar
      from &termDS, &clusterDS
      where &termDS..&docVar = &clusterDS..&docVar;
"
;
run;
 
simple.summary /
   table={name="cluster_terms", groupBy={"&clusterVar", "&termVar"}}
   inputs={"&tfVar"}
   summarySubset={"sum"}
   casout={name="cluster_terms_sum", replace=true};
run;
 
simple.topk / 
   table={name="cluster_terms_sum", groupBy={"&clusterVar"}}  
   inputs={"&termVar"} 
   topk=&topk
   bottomk=0 
   raw=True 
   weight="_Sum_"
   casout={name="&dsOut", replace=true};
run;
quit;
 
proc cas;
   table.dropTable name="cluster_terms" quiet=true; run;
   table.dropTable name="cluster_terms_sum" quiet=true; run;
quit;
%mend clusterProfile;
 
 
/*-------------------------------------*/
/* Stratified sampling by cluster      */
/*-------------------------------------*/
%macro strSampleByCluster(
   docDS=, 
   docClusterDS=, 
   docVar=, 
   clusterVar=_CLUSTER_ID_, 
   seed=12345,   
   sampPct=, 
   dsOut=
);
proc cas;
loadactionset "sampling";
stratified result=r /
   table={name="&docClusterDS", groupby={"&clusterVar"}}
   sampPct=&sampPct 
   partind="TRUE" 
   seed=&seed
   output={casout={name="sampling_out",replace="TRUE"},
                   copyvars={"&docVar", "&clusterVar"}};
run;
print r.STRAFreq; run;
 
loadactionset "fedsql";
execdirect casout={name="&dsOut", replace=true} 
query="
   select docDS.*, &clusterVar
   from &docDS docDS, sampling_out
   where docDS.&docVar=sampling_out.&docVar
     and _PartInd_=1;
"
;
run;
 
proc cas;
   table.dropTable name="sampling_out" quiet=true; run;
quit; 
%mend strSampleByCluster;
 
/*-------------------------------------*/
/* Start CAS Server.                   */
/*-------------------------------------*/
cas casauto host="host.example.com" port=5570;
libname sascas1 cas;
caslib _all_ assign;
 
/*-------------------------------------*/
/* Prepare and load data.              */
/*-------------------------------------*/
%let myData=movie_reviews;
 
proc cas;
loadtable result=r / 
   importOptions={fileType="csv", delimiter='TAB',getnames="true"}
   path="data/movie_reviews.txt"
   casLib="CASUSER"
   casout={name="&myData", replace="true"} ;
run;
quit;
 
/* Browse the data */
proc cas;
   columninfo / table={name="&myData"};
   fetch / table = {name="&myData"};
run;
quit;
 
/* generate one unique index using data step */
proc cas;
datastep.runcode /
code = "
   data &myData;
      set &myData;
      rename id = _document_;
      keep id text score;  
   run;
";
run;
quit;
 
/* create stop list*/
data sascas1.stopList;
   set sashelp.engstop;
run;
 
/* Get tfidf by term by document */
%getTfidf(
   dsIn=&myData, 
   docVar=_document_, 
   textVar=text, 
   language=english, 
   stemming=true, 
   stopList=stopList, 
   dsOut=doc_term_tfidf
);
 
/* document-term matrix */
%DocToVectors(
   dsIn=doc_term_tfidf, 
   docVar=_document_, 
   termVar=_term_, 
   tfVar=_tfidf_, 
   dimSize=3000, 
   dsOut=doc_vectors
);
 
/* Cluster documents */
%clusterDocs(
   dsIn=doc_vectors, 
   nClusters=50, 
   seed=12345,   
   dsOut=doc_clusters
);
 
/* Get top-k words of each cluster */
%clusterProfile(
   termDS=doc_term_tfidf, 
   clusterDS=doc_clusters, 
   docVar=_document_, 
   termVar=_term_, 
   tfVar=_tfidf_, 
   clusterVar=_cluster_id_, 
   topk=10, 
   dsOut=cluster_topk_terms
);
 
/* Browse topk terms of the first cluster */
proc cas;
fetch / 
   table={name="cluster_topk_terms",
          where="_cluster_id_=1"};
run;
quit;
 
/* Stratified sampling by cluster      */
%strSampleByCluster(
   docDS=&myData, 
   docClusterDS=doc_clusters, 
   docVar=_document_, 
   clusterVar=_cluster_id_, 
   seed=12345,   
   sampPct=5,
   dsOut=doc_sample_by_cls
);
 
/* Browse sample documents of the first cluster */
proc cas;
fetch / 
   table={name="doc_sample_by_cls",
          where="_cluster_id_=1"};
run;
quit;

How to sample textual data with SAS was published on SAS Users.

6月 092018
 

SAS Studio is the latest way you can access SAS. This newer interface allows users to reach SAS through a web browser, offering a number of unique ways that SAS can be optimized. At SAS Global Forum 2018, Lora Delwiche (SAS) and Susan J Slaughter (Avocet Solutions) gave the presentation, “SAS Studio: A New Way to Program in SAS.” This post reviews the paper, offering you insights of how to enhance your SAS Studio programming performance.

This new interface is a popular one, as it is included in Base SAS and used for SAS University Edition and SAS OnDemand for Academics. It can be considered a self-serving system, since you write programs in SAS Studio itself that are then processed through SAS and delivered results. Its ease of accessibility from a range of computers is putting it in high demand – which is why you should learn how to optimize its use.

How to operate

A SAS server processes your coding and returns the results to your browser, in order to make the programs run successfully. By operating in Programmer mode, you are given the capabilities to view Code, Log, and Results. On the right side of the screen you can write your code, and the toolbar allows you to access the many different tools that are offered.

SAS Studio

Libraries are used to access your SAS data sets, where you can also see the variables contained in each set. You can create your own libraries, and set the path for your folder through SAS Studio.

In order to view each data set, the navigation pane can also be used. Right click on the data set name and select “Open” to access files through this method. These datasets can be adjusted in a number of ways: columns can be shifted around by dragging the headings; column sizes can be adjusted; the top right corner has arrows to view more information; clicking on the column heading will sort that data.

 

In order to control your data easily, filters can be used. Filters are accessed by right-clicking the column heading and selecting the filter that best fits your needs.

How to successfully code

A unique feature to SAS Studio is its code editor that will automatically format your code. Clicking on the icon will properly format each statement and put it on its own line. Additionally, syntax help pops up as you type to give you possible suggestions in your syntax, a tool that can be turned on or off through the Preferences window.

One tool that’s particularly useful is the snippet tool, where you can copy and paste frequently used code.

Implementing and Results

After code is written, the Log tool can help you review your code, whereas Results will generate your code carried out after it has been processed. The Results tab will give you shareable items that can be saved or printed for analysis purposes.

Conclusion

These insights offer just a glimpse of all of the capabilities in programming through SAS Studio. Through easy browser access, your code can be shared and analyzed with a few clicks.

Additional Resources

Additional SAS Global Forum Proceedings
SAS Studio Videos
SAS Studio Courses
SAS Studio Programming Starter Guide
SAS Studio Blogs
SAS Studio Community

Other SAS Global Forum Programming Papers of Interest

Code Like It Matters: Writing Code That's Readable and Shareable
Paul Kaefer

Identifying Duplicate Variables in a SAS ® Data Set
Bruce Gilsen

Macros I Use Every Day (And You Can, Too!)
Joe DeShon

Merge with Caution: How to Avoid Common Problems when Combining SAS Datasets
Joshua M. Horstman

SAS Studio: A new way to program in SAS was published on SAS Users.

6月 022018
 

The SAS PlatformFor software users and SAS administrators, the question often becomes how to streamline their approach into the easiest to use system that most effectively completes the task at hand. At SAS Global Forum 2018, the topic of a “Big Red Button” was an idea that got audience members asking – is there a way to have just a few clicks complete all the stages of the software administration lifecycle? In this article, we review Sergey Iglov’s SAS Global Forum paper A ‘Big Red Button’ for SAS Administrators: Myth or Reality?” to get a better understanding of what this could look like, and how it could change administrators’ jobs for the better. Iglov is a director at SASIT Limited.

What is a “Big Red Button?”

With the many different ways the SAS Platform can be utilized, there is a question as to whether there is a single process that can control “infrastructure provisioning, software installation and configuration, maintenance, and decommissioning.” It has been believed that each of these steps has a different process; however, as Iglov concluded, there may be a way to integrate these steps together with the “Big Red Button.”

This mystery “button” that Iglov talked about would allow administrators to easily add or delete parts of the system and automate changes throughout; thus, the entire program could adapt to the administrator’s needs with a simple click.

Software as a System –SAS Viya and cloud based technologies

Right now, SAS Viya is compatible with the automation of software deployment processes through a centralized management. Right now, SAS Viya is compatible with a centralized automated deployment process. Through insights easily created and shared on the cloud, SAS Viya stands out, as users can access a centrally hosted control panel instead of needing individual installations.

Using CloudFormation by Amazon Web Services

At this point, the “Big Red Button” points toward systems such as CloudFormation. CloudFormation allows users of Amazon Web Services to lay out the infrastructure needed for their product visually, and easily make changes that will affect the software. As Iglov said, “Once a template is deployed using CloudFormation it can be used as a stack to simplify resources management. For example, when a stack is deleted all related resources are deleted automatically as well.”

Conclusion

Connecting to SAS Viya, CloudFormation can install and configure the system, and make changes. This would help SAS administrators adapt the product to their needs, in order to derive intelligence from data. While the future potential to use a one-click button is out there for many different platforms, using cloud based software and programs such as CloudFormation enable users to go through each step of SAS Platform’s administration lifecycle efficiently and effectively.

Additional Resources

SAS Viya Brochure
Sergey Iglov: "A 'Big Red Button' for SAS administrators: Myth or Reality?"

Additional SAS Global Forum 2018 talks of interest for SAS Administrators

A Programming Approach to Implementing SAS® Metadata-Bound Libraries for SAS® Data Set Encryption Deepali Rai, SAS Institute Inc.

Command-Line Administration in SAS® Viya®
Danny Hamrick, SAS

External Databases: Tools for the SAS® Administrator
Mathieu Gaouette, Prospective MG inc.

SAS® Environment Manager – A SAS® Viya® Administrator’s Swiss Army Knife
Michelle Ryals, Trevor Nightingale, SAS Institute Inc.

Troubleshooting your SAS® Grid Environment
Jason Hawkins, Amadeus Software Limited

Multi-Factor Authentication with SAS® and Symantec VIP
Jody Steadman, Mike Roda, SAS Institute Inc.

OpenID Connect Opens the Door to SAS® Viya® APIs
Mike Roda, SAS Institute Inc.

Understanding Security for SAS® Visual Analytics 8.2 on SAS® Viya®
Antonio Gianni, Faisal Qamar, SAS Institute Inc.

Latest and Greatest: Best Practices for Migrating to SAS® 9.4
Alec Fernandez, Leigh Fernandez, SAS Institute Inc.

Planning for Migration from SAS® 9.4 to SAS® Viya®
Don B. Hayes, DLL Consulting Inc.; Spencer Hayes, Cached Consulting LLC; Michael Shealy, Cached Consulting LLC; Rebecca Hayes, Green Peach Consulting Inc.

SAS® Viya®: Architect for High Availability Now and Users Will Thank You Later
Jerry Read, SAS Institute Inc.

Taming Change: Bulk Upgrading SAS® 9.4 Environments to a New Maintenance Release
Javor Evstatiev, Andrey Turlov

Is there a “Big Red Button” to use The SAS Platform? was published on SAS Users.

5月 302018
 

SAS Enterprise Miner has been a leader in data mining and modeling for over 20 years. The system offers over 80 different nodes that help users analyze, score and model their data. With a wide range of functionalities, there can be a number of different ways to produce the results you want.

At SAS® Global Forum 2018, Principal Systems Engineer Melodie Rush spoke about her experience with SAS® Enterprise Miner™, and compiled a list of hints that she believe will help users of all levels. This article previews her full presentation, Top 10 Tips for SAS Enterprise Miner Based on 20 Years’ Experience. The paper includes images and further details of each of the tips noted below; I’d encourage you to check it out to learn more.

Top Ten Tips for Enterprise Miner

Tip 1: How to find the node you’re looking for

If you struggle finding the node that best fits what you need, there’s a system that can simplify it.

Nodes are organized by Sample, Explore, Modify, Model, and Assess. Find which of these best describes what you are trying to do, and scroll across each node alphabetically for a description.

Tip 2: Add node from diagram workspace

Double click any node on the toolbar to see its properties. An example of the results this presents are shown below:

Top Ten Tips for Enterprise Miner

Tip 3: Clone a process flow

Highlight process flow by dragging your mouse across, right-click or CTRL+C, and Paste or CTRL+V where you want to insert process flow.

Tip 4: New features

  • There’s a new tab, HPDM (High-Performance Data Mining), which contains several new nodes that cover data mining and machine learning algorithms.
  • There are two new nodes under Utility that incorporate Open Source and SAS Viya.
  • The Open Source Integration node allows you to use R language code in SAS Enterprise Miner diagrams.
  • A SAS Viya Code node now incorporates code that will be used in SAS Viya and CAS, and algorithms from SAS Visual Data Mining and Machine Learning.
  • To save and share your results, there are now the Register Model and Save Data nodes under Utility.
  • You can now register models to the SAS Metadata Server to score or compare easily.
  • A Save Data node lets you save training, validation, test, score, or transaction data as SAS, JMP, Excel, CSV or tab-delimited files.

Tip 5: The unknown node

The reporter node under Utility allows you to easily document your Enterprise Miner process flow diagrams. A .pdf or .rtf is created with an image of the process flow.

Tip 6: The node that changes everything

The Metadata node, on the Utility tab, allows you to change metadata information and values in your diagram. You also can capture settings to then apply to data in another diagram.

Tip 7: How to generate a scorecard

A scorecard emphasizes what variables and values from your model are important. Values are reported on a 0 to 1,000 scale, with the higher being more likely the event you’re measuring occurs. To do this, have the Reporter node follow a Score node, and then change the Nodes property to Summary under Reporter node properties.

Tip 8: How to override the 512 level limit

If faced with the error message, “Maximum target levels of 512 exceeded,” your input is resulting in more than 512 distinct results. To get around this, you need to change EM_TRAIN_MAXLEVELS to another value. To do so, either change the macro value in properties

or change the macro value in project start code.

Tip 9: Which variable selection method should I use?

Instead of choosing just one variable selection method, you can combine different ones such as Decision Trees, Forward, Chi-Square, and others. The results can be combined using different selection properties, such as None (no changes made from original metadata), Any (reject a variable if any previous variable selection nodes reject it), All (reject a variable if all of the previous variable selection nodes reject it), and Majority (reject a variable if the majority of the variable selection nodes reject it).

Tip 10: Interpreting neural network

Decision trees can be produced to interpret networks, by changing the Prediction variable to be your Target and the Target variable to be rejected.

Conclusion

With so many options to create models that best suit your preferences, these tips will help sharpen your focus and allow you to use SAS Enterprise Miner more efficiently and effectively. This presentation was one in a series of talks on Enterprise Miner tool presented at SAS® Global Forum 2018.

Additional Resources

SAS Enterprise Miner
SAS Enterprise Learning Tutorials
Getting Started With SAS Enterprise Miner Tutorial Videos

Additional SAS Enterprise Miner talks from Global Forum 2018

A Case Study of Mining Social Media Data for Disaster Relief: Hurricane Irma
Bogdan Gadidov, Linh Le, Analytics and Data Science Institute, Kennesaw State University

A Study of Modelling Approaches for Predicting Dropout in a Business College
Xuan Wang, Helmut Schneider, Louisiana State University

Analysis of Nokia Customer Tweets with SAS® Enterprise Miner™ and SAS® Sentiment Analysis Studio
Vaibhav Vanamala MS in Business Analytics, Oklahoma State University

Analysis of Unstructured Data: Topic Mining & Predictive Modeling using Text
Ravi Teja Allaparthi

Association Rule Mining of Polypharmacy Drug Utilization Patterns in Health Care Administrative Data Using SAS® Enterprise Miner™
Dingwei Dai, Chris Feudtner, The Children’s Hospital of Philadelphia

Bayesian Networks for Causal Analysis
Fei Wang and John Amrhein, McDougall Scientific Ltd.

Classifying and Predicting Spam Messages Using Text Mining in SAS® Enterprise Miner™
Mounika Kondamudi, Oklahoma State University

Image Classification Using SAS® Enterprise Miner 14.1

Model-Based Fiber Network Expansion Using SAS® Enterprise Miner™ and SAS® Visual Analytics
Nishant Sharma, Charter Communications

Monte Carlo K-Means Clustering SAS Enterprise Miner
Donald K. Wedding, PhD Director of Data Science Sprint Corporation

Retail Product Bundling – A new approach
Bruno Nogueira Carlos, Youman Mind Over Data

Using Market Basket Analysis in SAS® Enterprise MinerTM to Make Student Course Enrollment Recommendations
Shawn Hall, Aaron Osei, and Jeremiah McKinley, The University of Oklahoma

Using SAS® Enterprise Miner for Categorization of Customer Comments to Improve Services at USPS
Olayemi Olatunji, United States Postal Service Office of Inspector General

Top 10 tips for SAS Enterprise Miner based on 20 years’ experience was published on SAS Users.