SVD is at the heart of many modern machine learning algorithms. As a computing vehicle for PCA, SVD can be obtained using PROC PRINCOMP on the covariance matrix of a given matrix withou correction for intercept. With SVD, we are ready to carry out many tasks that are very useful but not readily available in SAS/STAT, such as TextMining using LSI [default algorithm used in SAS TextMiner [1]], multivariate Time Series Analysis using MSSA, Logistic-PLS, etc.

I also highly recommend the book "Principal Component Analysis 2nd Edition" by I. T. Jolliffe. Prof. Jollliffe smoothly gave a thorough review of PCA and its applications in various fields, and provided a road map for further research and reading.


%macro SVD(
input_dsn,
output_V,
output_S,
output_U,
input_vars,
ID_var,
nfac=0
);

%local blank   para  EV  USCORE  n  pos  dsid nobs nstmt
shownote  showsource  ;

%let shownote=%sysfunc(getoption(NOTES));
%let showsource=%sysfunc(getoption(SOURCE));
options nonotes  nosource;

%let blank=%str( );
%let EV=EIGENVAL;
%let USCORE=USCORE;

%let n=%sysfunc(countW(&input_vars));

%let dsid=%sysfunc(open(&input_dsn));
%let nobs=%sysfunc(attrn(&dsid, NOBS));
%let dsid=%sysfunc(close(&dsid));
%if  &nfac eq 0 %then %do;
%let nstmt=&blank; %let nfac=&n;
%end;
%else %do;
%let x=%sysfunc(notdigit(&nfac, 1));
%if  &x eq 0 %then %do;
%let nfac=%sysfunc(min(&nfac, &n));
%let nstmt=%str(n=&nfac);
%end;
%else %do;
%put ERROR: Only accept non-negative integer.;
%goto exit;
%end;
%end;

/* calculate U=XV/S */
%if &output_U ne %str() %then %do;
%let outstmt=  out=&output_U.(keep=&ID_var  Prin:);
%end;
%else %do;
%let outstmt=&blank;
%end;

%let options=noint cov noprint  &nstmt;

proc princomp data=&input_dsn
/* out=&input_dsn._score */
&outstmt
outstat=&input_dsn._stat(where=(_type_ in ("&USCORE", "&EV")))  &options;
var &input_vars;
run;
data &output_S;
set &input_dsn._stat;
format Number 7.0;
format EigenValue Proportion Cumulative 7.4;
keep Number EigenValue  Proportion Cumulative;
where _type_="&EV";
array _X{&n} &input_vars;
Total=sum(of &input_vars);
Cumulative=0;
do Number=1 to dim(_X);
EigenValue=_X[number];
Proportion=_X[Number]/Total;
Cumulative=Cumulative+Proportion;
output;
end;
run;

%if &output_V ne %str() %then %do;
proc transpose data=&input_dsn._stat(where=(_TYPE_="&USCORE"))
out=&output_V.(rename=(_NAME_=variable))
name=_NAME_;
var &input_vars;
id _NAME_;
format &input_vars 8.6;
run;
%end;

/* recompute Proportion */
%if &output_S ne %str() %then %do;
data &output_S;
set &input_dsn._stat ;
where _TYPE_="EIGENVAL";
array _s{*} &input_vars;
array _x{&nfac, 3} _temporary_;
Total=sum(of &input_vars, 0);
_t=0;
do _i=1 to &nfac;
_x[_i, 1]=_s[_i]; _x[_i, 2]=_s[_i]/Total;
if _i=1 then _x[_i, 3]=_x[_i, 2];
else _x[_i, 3]=_x[_i-1, 3]+_x[_i, 2];
_t+sqrt(_x[_i, 2]);
end;
do _i=1 to &nfac;
Number=_i;
EigenValue=_x[_i, 1]; Proportion=_x[_i, 2]; Cumulative=_x[_i, 3];
S=sqrt(_x[_i, 2])/_t;  SinguVal=sqrt(_x[_i, 1] * &nobs);
keep Number EigenValue  Proportion Cumulative  S SinguVal;
output;
end;
run;
%end;

%if &output_U ne %str() %then %do;
data &output_U;
array _S{&nfac}  _temporary_;
if _n_=1 then do;
do j=1 to &nfac;
set  &output_S(keep=SinguVal)  point=j;
_S[j]=SinguVal;
if abs(_S[j]) < CONSTANT('MACEPS') then _S[j]=CONSTANT('BIG');
end;
end;
set &output_U;
array _A{*}  Prin1-Prin&nfac;
do _j=1 to dim(_A);
_A[_j]=_A[_j]/_S[_j];
end;
keep &ID_var Prin1-Prin&nfac ;
run;
%end;

%exit:
options &shownote  &showsource;
%mend;


Try the following sample code to examine the results:


data td;
input x1 x2;
cards;
2 0
0 -3
;
run;

%let input_dsn=td;
%let id_var= ;
%SVD(&input_dsn,
output_V,
output_S,
output_U,
x1  x2,
&id_var,
nfac=0
);


Reference:
[1] Albright, Russ, "Taming Text with the SVD", SAS Institute Inc., Cary, NC, available at :

[2] Jolliffe, I. T. , "Principal Component Analysis", 2nd Ed., Springer Series in Statistics, 2002

About a month ago, I was telling someone a story about my morning jog. I described what I do as “ambulating”. She looked at me strangely and asked if I worked in the medical field because no one uses the word “ambulate” in everyday language. I laughed and said, "no." I explained that I hate to use the term “run” or “jog” because it might insult people who are really good at that kind of exercise. My speed and form leave a lot of room for improvement!

Actually, I am in the Americas Business Analytics Practice at SAS and my team works with many organizations on a wide variety of analytics projects. With SAS’ recent release of our new Text Analytics offerings , we’ve been busier than ever. Companies around the world are asking for more information about how to apply the technology to their business problems. They understand that they have been accumulating a lot of text data, without gaining much business value from it.

I have this theory that there are two distinct types of people, those that are most comfortable with ”math and science”, and those that are most comfortable with “language." I refer to the first group as being from the “Math World” and the second group from the “Word World." Text Analytics requires people from both worlds to come together to interpret words and concepts in context, organize, explore, and analyze them to provide actionable insights for business.

Although I consider myself from the “Math World,” I guess my recent work in the “Word World” has bled into my personal life. We are working with an insurance company to categorize various accident and injury reports for more rapid claims processing as well as provide risk assessments. Synthesizing medical terminology in the claims might be how “ambulate” snuck into my vocabulary.

In a recent article titled "Text Analytics – Two Worlds Collide" published in BeyeNetwork, I tried to demystify Text Analytics, by creating parallels between the “Word World” and the “Math World”, and by providing lots of examples of how text analytics could be used in business. I tried to make it as jargon-free as possible.

Which “world” do our readers consider themselves from?

SAS users groups are run by SAS users, with support, but not directives, from SAS. During my 15 year tenure in the SAS users group program, I’ve seen up close how important user feedback is in helping to improve the conference experience. Typically we gather feedback informally in presentation sessions and in the exhibit and demo rooms. This year however, we’re formalizing the feedback opportunity by offering two special focus group sessions to explore your ideas for SAS Global Forum 2011 and regional/special interest SAS users group conferences.

Refreshments and free gifts will be provided for those that participate, but hurry, because space is limited. These sessions are a great opportunity for both new and veteran conference attendees to share ideas and provide feedback to conference planning teams. Your opinion does counts, and you can help make a difference.

http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap41/sect30.htm

\begin{align}D & = -2(\ln(\text{likelihood for null model}) - \ln(\text{likelihood for alternative model})) \\& = -2\ln\left( \frac{\text{likelihood for null model}}{\text{likelihood for alternative model}} \right).\end{align}

%macro LRMixtureTest(fullmodel=,redmodel=,DFfull=,DFred=); /* This macro calculates the Likelihood Ratio Test for fixed effects in a mixed model See Singer Applied Longitudinal Data Analysis pp 116-120. and also calculates the mixture method pvalue for random effects in a mixed model -see KKNM chapter 26. The macro argument &fullmodel refers to the output dataset produced by running the full model in proc mixed and using the statement:     ods output FitStatistics=fm; The macro argument &DFfull refers to the output dataset produced by running the full model in proc mixed and adding to the statement:     ods output FitStatistics=fm SolutionF=SFfm; The number of parameters being tested (ie, fixed effects) determine the degrees of freedom in the LRT and come from this dataset. The argument &redmodel refers to the output dataset produced by running the reduced model in proc mixed using the statement:    ods output FitStatistics=rm  SolutionF=SFrm ; Maximum Likelihood should be used when comparing fixed effects because it maximizes the likelihood of the sample data. Using the Restricted Maximum likelihood should only be used when assessing a random term (ie, the fixed effects should be the same in both models)  because in RML we maximize the likelihood of the residuals. The degrees of freedom would not be correct if the number of fixed effects were different in each model when using RML -which is the default in PROC MIXED. */ data &fullmodel; set &fullmodel;  if descr='-2 Res Log Likelihood' then chisqfullRML=value;  *RML;  else if descr='-2 Log Likelihood' then chisqfullFML=value; *FML; run; data &redmodel; set &redmodel;   if descr='-2 Res Log Likelihood' then chisqredRML=value; *RML;   else  if descr='-2 Log Likelihood' then chisqredFML=value; *FML; run; proc sql; CREATE TABLE flDF AS SELECT  effect, count(*) AS fullDF from work.&DFfull; quit; *Count number of effects to get degrees of freedom; proc sql; create table rdDF as select  effect, count(*) as redDF from work.&DFred; quit; data degfree (drop=effect);   merge  work.flDF work.rdDF; if _n_=1 then  LRTDF= fullDF  - redDF; run; data likelihood;  merge &fullmodel &redmodel degfree;  testintRML=abs((chisqredRML)-(chisqfullRML)); **Models can yield negative LLs, those that are smaller in absolute value -ie, closer to 0 fit better (pg 116-117, Singer);  7 testintFML=abs((chisqredFML)-(chisqfullFML)); **Models can yield negative LLs, those that are smaller in absolute value -ie, closer to 0 fit better (pg 116-117, Singer);  pvaluemixture=(( .5*(1-probchi(testintRML,2)) + .5*(1-probchi(testintRML,1)) ));    *for random terms;   pvalueLRT=(1-probchi(testintFML,LRTDF));      *For fixed terms;   proc print data=likelihood split='*' noobs;    var  testintFML  pvalueLRT;  format  pvalueLRT 6.4;  where testintFML ne .;    label testintFML='likelihood ratio*test statistic*          comparing*reduced model to full model'                    pvalueLRT='p-value LRT';       title "Likelihood Ratio Test for fixed effects"; run;   proc print data=likelihood split='*' noobs;    var  testintRML  pvaluemixture;  format pvaluemixture  6.4;  where  testintRML ne . ;    label testintRML='likelihood ratio*test statistic*          comparing*reduced model to full model'                    pvaluemixture='mixture method p-value';       title "Mixture method Test for random effects";  run;  %mend LRMixtureTest;

• fullmodel：full model 的 -2 log likelihood function 值
• redmodel：reduced model 的 -2 log likelihood function 值
• DFfull：full model 的自由度
• DFred：reduced model 的自由度

%include '\\cdc\private\mixture method pvalue macro1.sas';proc mixed data=bhb7.allUS2levge10blk method=ml covtest empirical noclprint ;class  state_id;model diffcov=pctblk/ ddfm=contain s;random intercept pctblk /subject=state_id type=ar(1) s ;ods output FitStatistics=fm  SolutionF=SFfm ;run;proc mixed data=bhb7.allUS2levge10blk method=ml covtest empiricalnoclprint;class  state_id;model diffcov=/ddfm=contain   s;random intercept  pctblk /subject=state_id type=ar(1) s;ods output FitStatistics=rm SolutionF=SFrm ;run;

%LRMixtureTest(fullmodel=fm,redmodel=rm,DFfull=SFfm,DFred=SFrm);

Contact Information:
Barbara Bardenheier, MPH, MA
Centers for Disease Control and Prevention
1600 Clifton Rd, MS E-52
Atlanta, GA 30333
Tel : (404) 639-8789
Fax : (404) 639-8614
Email : bfb7@cdc.gov

Hello! And welcome to the grand opening of Get, Grow, Keep, our spanking new group blog focused on the remarkably old business problems every marketer faces: namely, how to get, grow, and keep your best customers.

The Timeline of innovation on Wikipedia’s History of marketing entry makes it clear – the accelerating pace of technological innovation is a double-edge sword. Increasingly, we have new ways to glean insights about our customers, new ways to interact with our customers, and new ways to measure and improve our performance. But all these “new ways” also mean we marketers have other “new’s” not explicitly listed on the timeline: new tools to use, new strategies to devise, and perhaps most importantly, new (customer) expectations to meet.

And here we are. As a marketing organization, we struggle with the same challenges our customers and prospects face.

We are starting to ask new questions like:

• How can we evolve as an organization and develop better relationships with today’s highly empowered customers?

And we are continuing to ask fundamental questions like:

• What is the best way to create a single view of our customers?
• How can we uncover insights about our customers to drive our marketing decisions?
• What are the best solutions and strategies for executing integrated, cross-channel communications?
• How can we best develop and view metrics to measure and improve our marketing efforts?

Answer these questions correctly and you can acquire new customers, increase the value of current customers, and retain profitable customers longer—leading to a growing and profitable customer base. Answer incorrectly and you risk high attrition rates and low acquisition rates, and more to the point, you may end up acquiring and keeping unprofitable customers too.

We offer this new blog as a platform to talk with you, not only about the challenges of getting, growing, and keeping customers, but also about current customer intelligence/CRM industry news, events, ideas, and trends. From time to time we’ll invite industry leaders, analysts, professors and customers to provide new perspectives through guest blog posts. Not surprisingly, we are determined to have a lot of fun along the way as well.

What does success look like for this blog? Simply put, we’ll be measuring ourselves against our ability to encourage conversations, lower the usual customer/corporate barriers, and give you access to our key contributors.

Fortunately we have a fantastic group of core contributors, consisting of experienced and new bloggers, covering a range of marketing roles. Our team includes:

• Justin Huntsman (that’s me), and John Balla. We’re both field marketers focused on creating opportunities for our prospects and customers to learn more about our company and solutions.
• John Bastone is our global SAS Customer Intelligence product marketing manager. He's a direct marketing enthusiast with a current focus on “all things web and social."
• Mark Chaves is the director of media intelligence solutions for our Customer Intelligence business unit. His current focus is to guide the solution strategy for our marketing mix analysis and social marketing solutions.
• David B. Thomas is our social media manager. He blogs regularly at Conversations & Connections, the SAS social media blog.
• Matt Fulk manages our database marketing team and is leading a project focusing on buy-cycle marketing and list development optimization using SAS Marketing Automation.
• Alison Bolen is the editor of blogs and social content at SAS. You’ll find her regularly at the sascom voices blog.
• Deb Orton is our director of field marketing, responsible for leading a group of talented marketers "across the chasm" from traditional marketing approaches to social, digital methods.

Our names above are linked to our Linked-In profiles. We’d each be delighted to connect with you and to hear from you via comments to this blog, or our new group Twitter account @SAS_CI.

This blog is going to be great fun. Welcome!

FILENAME test URL "http://www.indeed.com/jobs?q=sas+programmer&limit=100&fromage=0&start=0" DEBUG LRECL=700;DATA test;   infile test length=len;   input record $varying700. len; *****DELETE MOST LINES W/O JOB;if index(record, 'cmpid')=0 then delete;*****DELETE HEAD ADVERSTISEMENT;if index(record, 'jobmap')=0 then delete;run;data test2 ;set test; format zip z5.;length state$2.;*****SET UP ROAD SIGNS;id1=index(record, 'srcname');id2=index(record, 'cmpesc');id3=index(record, 'cmplnk');id4=index(record, 'loc');id5=index(record, 'lat');id6=index(record, 'lon');id7=index(record, 'country');id8=index(record, 'zip');id9=index(record, 'state');id10=index(record, 'city');id11=index(record, 'title');id12=index(record, 'locid');   *****OUPUT VARIABLES ;   source=substr(record, id1+9, id2-id1-11);company=substr(record, id2+8, id3-id2-10); if company= 'na' or substr(company,1,1)="'" then company='N/A';loc=substr(record, id4+5, id5-id4-7);country=substr(record, id7+9, id8-id7-11);if id8+5=id9-id8-7 then zip=.;  else zip=substr(record, id8+5, id9-id8-7);state=substr(record, id9+7, id10-id9-9); if state= 'na' or substr(state,1,1)="'" then state=.;city=substr(record, id10+6, id11-id10-8); if city= 'na' or substr(city,1,1)="'" then city=.;title=substr(record, id11+7, id12-id11-9);   drop record id1-id12;   index=_n_;;run;ods rtf file='d:\sas.rtf';proc sort data=test2 out=test3;by state city;run;proc print data=test3;var index title company state city source;title "SAS programmer opens on &sysday, &sysdate ";title2 "Collected by &SYSTIME ";footnote "Created by Charlie Huang";run;ods rtf close;

Sometimes it is very handy to have a macro variable contanining the variables names of the dataset. Here are the 2 different ways you can create a macro variable with list of variables names ... *Method1: Using Proc Contents and Proc SQL; proc contents data=sashelp.class out=class; run; proc sql noprint; select distinct(name) into:vars separated by " " from class; quit; %put &vars; *Method2: Using SASHELP tables and Proc SQL; data class; set sashelp.vcolumn(where=(libname="SASHELP" and memname="CLASS")); keep name; run; proc sql noprint; select distinct(name) into:vars separated by " " from class; quit; %put &vars;

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]

Sometimes it is very handy to have a macro variable contanining the variables names of the dataset. Here are the 2 different ways you can create a macro variable with list of variables names ... *Method1: Using Proc Contents and Proc SQL; proc contents data=sashelp.class out=class; run; proc sql noprint; select distinct(name) into:vars separated by " " from class; quit; %put &vars; *Method2: Using SASHELP tables and Proc SQL; data class; set sashelp.vcolumn(where=(libname="SASHELP" and memname="CLASS")); keep name; run; proc sql noprint; select distinct(name) into:vars separated by " " from class; quit; %put &vars;

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]

Every person in my department at SAS (User and Customer Marketing) is involved in supporting SAS Global Forum in one way or another. One of the major roles is to coordinate the SAS Demo Area. I recently had a chat with my co-worker Katie Strange, who is responsible for teaming with many others at SAS to make the Demo Area a reality.

1. When did you start planning the SAS Demo Area?
From mid-October to April is the time frame when the planning is the most intense. However, like many events, the demo area planning is an on-going processes and overlaps with the current and future show.

2. How long does it take to build the SAS Demo Area, once you get on-site?
It takes an abundance of people with an incredible amount of talent to take the demo area from concept to reality. Our core team is a group of exceptional folks in Art and Scenic Operations, IT, Materials Management, Corporate Creative, Marketing and R&D. The demo area is a magnificent example of how SAS employees work in a cross-functional environment for the betterment of SAS and our customers.

But to answer your question - we start with a blank canvas at 6 am on Thursday morning, April 8, and are slated for the demo area to be set by 1 pm on Sunday, April 11. We open for attendees at 10 am on Monday, April 12.

3. What have you learned over the past few years that might surprise people?
I have learned that our show floor is a living, breathing and ever-evolving creature. Although the demo area takes many shapes and forms over the years, the one constant is the fact that this is an area that brings together a vast array of people because of their passion for knowledge and SAS. To me, that’s a beautiful thing.

4. Who can we find in the SAS Demo Area?
You will always be able to find SAS experts, such as SAS developers who are available and eager to help in any way they can. And at the Monday Night SAS Mixer you can find a few surprises, such as a lipsologist, a masseuse and much more! Don’t miss it!

5. What’s always been popular, and what’s new this year?
Perennial favorites are the Demo Alley where customers can have one on one time with SAS developers, the books store in the Publications area, the SAS Education Center and the SGF postcards booth.

There are also several exciting, new things this year such as the Innovations Wall and the Twitter Wall. The Innovations Wall is a place where attendees can view some of SAS’ innovations and contribute their own stories about how they have used SAS in innovative ways. The Twitter Wall will be displayed above the entrance to the Demo Area and will stream live tweets about the conference. There will also be a social networking booth near the Twitter Wall that is a resource for individuals who want to make the transition into the social networking realm but haven’t quite taken that leap yet.