8月 032010
 
We're taking a break from posting for most of August. We'll be back in a month with new examples, including R- and SAS-applicable tricks and tools.

Please drop us any ideas in the comments or by e-mail. We love feedback of any kind.

持续高温天气,大家纷纷表示很热 (从7月30日持续分析到8月5日)

 SAS 绘图, 世博, 仪表盘, 温度, 温湿指数, 舒适度  持续高温天气,大家纷纷表示很热 (从7月30日持续分析到8月5日)已关闭评论
8月 032010
 

子弹图(Bullet graph/chart)是由斯蒂芬(Stephen Few)从条形图演化而来。据称灵感来自温度计图和仪表盘中的进度条。子弹图提供了一个用于仪表盘可选图形。相对于表示KPI指标的常用仪表盘的gagues和meters,它只需很小的空间,简洁和提供更多的信息,在这里我用它在表示夏天天气的舒适程度。温湿指数是用温度和湿度为变量用来描述人对天气舒适程度的感觉指标。显然,像处于武汉,长沙,上海等南方城市,仅用温度这一指标来衡量天气是否适宜是不可靠的。如果全面考虑人对热环境的感受而言,影响因素很多,除了生理因素,气象因素包括温度、湿度、风速、风向、气压和辐射等等。但是考虑过多的因素使得指标复杂,难以实施使用。于是本博用易于得到的温度和湿度两个变量,使用温湿指数公式:THI= 温度-0.55*(1-相对湿度)*(温度-14.5)。其中温度采用摄氏温度,单位为摄氏度,相对湿度为干湿球温度计读得,单位为%。

上图数据测试地点是上海虹桥机场,时间是2010.08.03早上7:30,温度为33度,相对湿度为75%,THI达到30.5。 温湿指数THI反映了群体的人对环境的热感受。当夏季的THI小于21.1时,几乎无人有不舒适感;THI>=21.1时,随着THI的增大,具有不舒适感的人增多;THI达到23.9时,约有50%的人感到不舒适;THI达到26.6时,几乎无人感到舒适;特别是THI达到29.5时,酷热难忍。许多国家政府甚至将这一值作为工厂企业等停业下限指标(文献来自1981年)。       Continue reading »

 Posted by at 12:00 上午
8月 022010
 
Contributed by Scott Vodicka, a member of the SAS Global Consulting Business Intelligence Practice

The other day I needed to go to the SAS Customer Support Website and an article caught my eye. The title of the article is FAQ: How to make user-defined formats available in a SAS Business Intelligence deployment. So I think to myself... you know everytime I need to specify a custom SAS Format Catalog to a BI environment, I have to go look it up just to make sure to get it done right. I do not know why, but I do.

I will let you read the SAS Note to get all of the details, but here are a couple of points about what has changed for specifying user-defined format catalogs to your SAS BI Environment. As we all know, or in my case have to look up, you have to place your user-defined formats catalog named formats.sas7bcat in the SASFormats directory. (Your SASFormats directory should be located in a path similar to this
C:\SAS\Config\Lev1\SASApp\SASEnvironment\SASFormats.)
Next, update the sasv9_usermods.cfg (new in SAS 9.2) in your Application Server directory, which is in a path like this C:\SAS\Config\Lev1\SASApp, with the following lines to register your format catalog:

-set MyFmtLib "SASEnvironment/SASFormats"
-insert fmtsearch MyFmtLib

The insert statement is new to SAS 9.2.

The SAS Note goes on to explain how you can use a different name for your format catalog, and how you can use a different directory for your format catalog if you do not want to use the standard one in your SAS configuration. Oh, one more thing, the note also compares the SAS 9.2 methods to the SAS 9.1 methods, so if you are still using SAS 9.1 then this note will help you out too!

Thank you Technical Support for this helpful SAS Note!

Note: Scott has also contributed the post SAS gets your attention with a text message.
8月 022010
 
Link: http://support.sas.com/resources/papers/proceedings10/096-2010.pdf

在使用PROC FREQ執行卡方檢定時,如果看到列聯表下面出現warning的訊息,就表示這個列聯表有50%以上的期望值小於五,因此程式會建議使用Fisher's exact test。然後使用者必須回到程式裡面,在tables statement後面加上一個exact的選項,再重跑一次PROC FREQ才能得到Fisher's exact test的結果。Wei Xu於SAS GLOBAL FORUM 2010發表了一個macro程式,讓電腦自動幫使用者判斷要不要進行Fisher's exact test,若需要的話會順便把Fisher's exact test的結果跑出來,使用者不需再去重新執行PROC FREQ一次。



這個macro名叫%RUN_FISHERS,裡面只有四個參數須要設定:

  • Data= 要進行卡方檢定的資料名稱
  • Row= 放在行的變數名稱
  • Col= 放在列的變數名稱
  • Count= 如果資料是已經整理好的列聯資料,則會多一個計數變數來表示某一欄位的個數,此時需要把這個變數定義在這個參數裡面。如果不是列聯資料的話,可在此留空白。
範例:
 data test; 
   input r c ct; 
   datalines; 
   1 1 12 
   1 2 5 
   1 3 6 
   2 1 13 
   2 2 2 
   2 3 4 
   ; 
 run;   
%run_fishers(data=test, row=r, col=c, count=ct);
由於此資料會出現warning訊息,所以Fisher's exact test的結果就會直接顯示在報表最下方。如果今天資料變成:
data test; 
input r c ct; 
datalines; 
1 1 12 
1 2 50 
1 3 60 
2 1 13 
2 2 20 
2 3 40 

run;   
%run_fishers(data=test, row=r, col=c, count=ct);
報表最下方就不會出現Fisher's exact test的結果。

最後是這個macro的原始碼:
%macro run_fishers (version, data=, row=, col=, count=); 
%let _version=1.0; 
%if &version ne %then %put RUN_FISHERS macro Version &_version; 
%let opts = %sysfunc(getoption(notes))  
            _last_=%sysfunc(getoption(_last_)); 
%if &version ne debug %then %str(options nonotes;); 
 
/* Check for newer version */ 
%if %sysevalf(&sysver >= 8.2) %then %do; 
%let _notfound=0; 
 
filename ver url 'http://ftp.sas.com/techsup/download/stat/versions.dat' termstr=crlf; 
 data _null_; 
    infile ver end=_eof; 
    input name:$15. ver; 
    if upcase(name)="&sysmacroname" then do; 
       call symput("_newver",ver); stop; 
    end; 
 
 
    if _eof then call symput("_notfound",1); 
run; 
  %if &syserr ne 0 or &_notfound=1 %then 
    %put &sysmacroname: Unable to check for newer version; 
  %else %if %sysevalf(&_newver > &_version) %then %do; 
    %put &sysmacroname: A newer version of the &sysmacroname macro is available.; 
    %put %str(         ) You can get the newer version at this location:; 
    %put %str(         ) http://support.sas.com/ctx/samples/index.jsp; 
  %end; 
 %end; 
  
proc freq data=&data noprint; 
  %if &count ne %then %str(weight &count / zeros;); 
  tables &row*&col / sparse outexpect out=_out1; 
run; 
 
proc means data=_out1 noprint; 
  var count; 
  output out=_out2; 
run; 
 
data _null_; 
  set _out1; 
  if expected<=5 then warn+1; 
  if _n_=1 then set _out2; 
  pct_lt5=warn/_freq_; 
  if _freq_=_n_; 
  warning=(pct_lt5>=.2); 
  call symput('warning',warning); 
run; 
 
options &opts; 
 
proc freq data=_out1; 
  weight count / zeros; 
  tables &row*&col / chisq; 
  %if &warning=1 %then %do; 
  exact fisher; 
  %end; 
run; 
 
%mend; 

CONTACT INFORMATION 
Wei Xu
Boston Scientific
100 Boston Scientific Way
Marlborough, MA 01752-1234
(508) 683-4264
wei.xu@bsci.com
8月 022010
 
Link: http://support.sas.com/resources/papers/proceedings10/096-2010.pdf

在使用PROC FREQ執行卡方檢定時,如果看到列聯表下面出現warning的訊息,就表示這個列聯表有50%以上的期望值小於五,因此程式會建議使用Fisher's exact test。然後使用者必須回到程式裡面,在tables statement後面加上一個exact的選項,再重跑一次PROC FREQ才能得到Fisher's exact test的結果。Wei Xu於SAS GLOBAL FORUM 2010發表了一個macro程式,讓電腦自動幫使用者判斷要不要進行Fisher's exact test,若需要的話會順便把Fisher's exact test的結果跑出來,使用者不需再去重新執行PROC FREQ一次。

Continue reading »
 Posted by at 10:06 下午
7月 302010
 
Terry Woodfield is teaching his new course, Text Analytics with SAS Text Miner, at the upcoming M2010 Data Mining Conference. Terry has been a SAS instructor for more than 10 years and has attended several Data Mining Conferences. He took some time out of his busy schedule to answer a few questions about this course and M2010.

1. How does Text Analytics with SAS Text Miner differ from other Text Mining courses that SAS offers?

TW: Other text analytics courses for SAS Content Categorization and SAS Sentiment Analysis are under development and should be available soon. There is some overlap in text analytic areas. My course addresses general text analytic topics, but the course only addresses solutions that use SAS Text Miner.

2. Text analytics seems to be a hot topic this year. Can you tell us how this course addresses some of the latest trends in the field of text analytics?

TW: The definition of text mining and text analytics has changed over the years because of better algorithms and faster computers. However, the details of algorithmic and technical trends in text analytics are not all that exciting to a typical user. A user is not so interested in factor rotations of concept vectors derived using Latent Semantic Analysis. The user is more interested in how the software learns concepts and topics from document collections and uses the derived topics and concepts to characterize a document, either for exploration and discovery, or for predictive modeling. Forensic linguistics uses text mining to identify criminals like Ted Kaczynski, the Unabomber. Warranty analysis to satisfy the TREAD act uses text mining to find concepts and topics that are highly correlated with automotive warranty problems that can lead to serious accidents. Technical support call centers use text mining to develop methods to automatically route problems to an appropriate expert. The latest trends encompass both the technology and the application of text analytics. More and more companies are realizing that text analytics can significantly improve business decisions. The course examines the current major application areas and provides data and example analyses to illustrate how text mining can be used to solve real problems.

3. Who should attend this course?

TW: Anyone involved in analytics with access to textual data. Examples include: complaints or requests from call center contacts; descriptions of warranty problems; adjuster notes tied to insurance claims; physician reports tied to insurance claims or health studies; news reports collected from the Internet; customer requests posted on company customer support Web sites; adverse event reports in operations at nuclear power plants, chemical plants, or refineries; adverse event reports in the health sciences; adverse event reports in transportation; forensic evidence in the form of ransom notes, manifestos, or other voice or written communication; homogeneous document collections such as the MEDLINE medical abstracts.

4. Why should M2010 attendees consider taking this course?

TW: If you have access to textual data and you license SAS Text Miner or are considering licensing SAS Text Miner, you should take the course. Even if you just want to see what text analytics is all about, you should consider taking the course. As an added bonus, if you are going to be in Las Vegas all week, taking a two day course will give you an extra free day to lose more money in the casinos.

5. You’ve attended several Data Mining Conferences. What have been some of the highlights over the years?

TW: A few talks stand out in my mind. I enjoyed listening to Tom Mitchell of Carnegie Mellon University talk about the use of pattern recognition methods to detect tumors at M2003. I especially enjoyed Edward Wegman of George Mason University describe methods for visualizing neural networks and other complex predictive models at M2002. Herb Edelstein of Two Crows Corp. is always interesting and informative no matter what topic he is addressing. His talk at M2001 describing misconceptions and pitfalls in data mining was particularly good.

6. Which speakers are you looking forward to seeing at this year’s conference?

TW: I’ve always enjoyed listening to John Elder relate his experiences providing practical data mining solutions. Will Neafsey of Ford Motor Company gave an excellent talk a few years ago about how Ford tries to anticipate customer preferences. I am eager to hear what Mr. Neafsey has to say this year. Tim Rey of Dow Chemical brings to the conference perhaps the widest breadth of experience using analytics to solve real business problems. Cailyn Clark is an expert at applying Text Analytics to solve business problems, so it is not surprising that I don’t want to miss her talk. I always learn something from Russell Albright and Leonardo Auslender of SAS.

7. You’ve been an instructor for SAS for more than 10 years. What is your favorite part of your job?

TW: Standing in front of a trapped audience telling bad jokes and pretending to know what I am talking about.
7月 302010
 
Just back from KDD2010. In the conference, there are several papers that interested me.

On the computation side, Liang Sun et al.'s paper [1], "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques" caught my eyes. Liang proves that a class of dimension reduction techniques, such as CCA, OPLS, LDA, etc, that relies on general eigenvalue decomposition, can be computed in a much cheaper way by decomposing the original computation into a least square problem and a much smaller scale eigenvalue decomposition problem. The equivalence of their two stage approach and direct eigenvalue decomposition is rigourously proved.

This technique is of particular interest to ppl like me that only have limited computing resources and I believe it would be good to implement their algorithm in SAS. For example, a Canonical Discriminant Analysis with above idea is demonstrated below. Note also that by specifing RIDGE= option in PROC REG, the regularized version can be implemented as well, besides, PROC REG is multi-threaded in SAS. Of course, the computing advantage is only appreciatable when the number of features is very large.

The canonical analysis result from reduced version PROC CANDISC is the same as the full version.

In fact, this exercise is the answer for Exercise 4.3 of The Elements of Statistical Learning [2]

[1]. Liang Sun, Betul Ceran, Jieping Ye, "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques", KDD2010, Washington DC.

[2]. Trevor Hastie, Robert Tibshirani, Jerome Friedman, "The Elements of Statistical Learning", 2nd Edition.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)




   proc format; 
      value specname 
         1='Setosa    ' 
         2='Versicolor' 
         3='Virginica '; 
   run; 
 
   data iris; 
      title 'Fisher (1936) Iris Data'; 
      input SepalLength SepalWidth PetalLength PetalWidth 
            Species @@; 
      format Species specname.; 
      label SepalLength='Sepal Length in mm.' 
            SepalWidth ='Sepal Width in mm.' 
            PetalLength='Petal Length in mm.' 
            PetalWidth ='Petal Width in mm.'; 
      symbol = put(Species, specname10.); 
      datalines; 
   50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3 
   63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2 
   59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2 
   65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3 
   68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3 
   77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3 
   49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2 
   64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3 
   55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1 
   49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1 
   67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1 
   77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2 
   50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1 
   61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1 
   61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1 
   51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1 
   51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1 
   46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1 
   50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3 
   57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1 
   71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3 
   49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1 
   49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1 
   66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1 
   44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2 
   47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2 
   74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1 
   56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3 
   49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1 
   56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2 
   51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3 
   54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3 
   61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3 
   68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1 
   45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1 
   55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1 
   51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2 
   63 33 60 25 3 53 37 15 02 1 
   ; 
   proc candisc data=iris out=outcan distance anova; 
      class Species; 
      var SepalLength SepalWidth PetalLength PetalWidth; 
   run;
 
  ods select none;
  proc glmmod data=iris  outdesign=H(keep=COL:);
           class  Species;
     model SepalLength=Species/noint;
  run;  

  data H;
          merge H   iris;
  run;

/**************************
for efficiency consideration, a view can also be used:
data H/view=H;
     set iris;
     array _S{*} Col1-Col3 (3*0);     
     do j=1 to dim(_S); _S[j]=0; end;
     _S[Species]=1;
     drop j;
run;
****************************/
  proc reg data=H  outest=beta;
          model Col1-Col3 = SepalLength SepalWidth PetalLength PetalWidth;
    output   out=P  p=yhat1-yhat3;
  run;quit;
  ods select all;


  proc candisc  data=P;
          class Species;
    var   yhat1-yhat3;
  run;

 Posted by at 12:19 下午

An Economic Approach for a Class of Dimensionality Reduction Techniques

 PROC CANDISC, PROC DISCRIM, PROC GLMMOD, PROC REG  An Economic Approach for a Class of Dimensionality Reduction Techniques已关闭评论
7月 302010
 


Just back from KDD2010. In the conference, there are several papers that interested me.

On the computation side, Liang Sun et al.'s paper [1], "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques" caught my eyes. Liang proves that a class of dimension reduction techniques, such as CCA, OPLS, LDA, etc, that relies on general eigenvalue decomposition, can be computed in a much cheaper way by decomposing the original computation into a least square problem and a much smaller scale eigenvalue decomposition problem. The equivalence of their two stage approach and direct eigenvalue decomposition is rigourously proved.

This technique is of particular interest to ppl like me that only have limited computing resources and I believe it would be good to implement their algorithm in SAS. For example, a Canonical Discriminant Analysis with above idea is demonstrated below. Note also that by specifing RIDGE= option in PROC REG, the regularized version can be implemented as well, besides, PROC REG is multi-threaded in SAS. Of course, the computing advantage is only appreciatable when the number of features is very large.

The canonical analysis result from reduced version PROC CANDISC is the same as the full version.

In fact, this exercise is the answer for Exercise 4.3 of The Elements of Statistical Learning [2]

[1]. Liang Sun, Betul Ceran, Jieping Ye, "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques", KDD2010, Washington DC.

[2]. Trevor Hastie, Robert Tibshirani, Jerome Friedman, "The Elements of Statistical Learning", 2nd Edition.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)




   proc format; 
      value specname 
         1='Setosa    ' 
         2='Versicolor' 
         3='Virginica '; 
   run; 
 
   data iris; 
      title 'Fisher (1936) Iris Data'; 
      input SepalLength SepalWidth PetalLength PetalWidth 
            Species @@; 
      format Species specname.; 
      label SepalLength='Sepal Length in mm.' 
            SepalWidth ='Sepal Width in mm.' 
            PetalLength='Petal Length in mm.' 
            PetalWidth ='Petal Width in mm.'; 
      symbol = put(Species, specname10.); 
      datalines; 
   50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3 
   63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2 
   59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2 
   65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3 
   68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3 
   77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3 
   49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2 
   64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3 
   55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1 
   49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1 
   67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1 
   77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2 
   50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1 
   61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1 
   61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1 
   51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1 
   51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1 
   46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1 
   50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3 
   57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1 
   71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3 
   49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1 
   49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1 
   66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1 
   44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2 
   47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2 
   74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1 
   56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3 
   49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1 
   56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2 
   51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3 
   54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3 
   61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3 
   68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1 
   45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1 
   55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1 
   51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2 
   63 33 60 25 3 53 37 15 02 1 
   ; 
   proc candisc data=iris out=outcan distance anova; 
      class Species; 
      var SepalLength SepalWidth PetalLength PetalWidth; 
   run;
 
  ods select none;
  proc glmmod data=iris  outdesign=H(keep=COL:);
           class  Species;
     model SepalLength=Species/noint;
  run;  

  data H;
          merge H   iris;
  run;

/**************************
for efficiency consideration, a view can also be used:
data H/view=H;
     set iris;
     array _S{*} Col1-Col3 (3*0);     
     do j=1 to dim(_S); _S[j]=0; end;
     _S[Species]=1;
     drop j;
run;
****************************/
  proc reg data=H  outest=beta;
          model Col1-Col3 = SepalLength SepalWidth PetalLength PetalWidth;
    output   out=P  p=yhat1-yhat3;
  run;quit;
  ods select all;


  proc candisc  data=P;
          class Species;
    var   yhat1-yhat3;
  run;

 Posted by at 12:19 下午
7月 292010
 
During the week of August 1 - 7, 2010, SAS Customer Intelligence will be making a big splash in the New York area and there are several ways you can catch up with us in person:

CRM Evolution Conference & Exhibition
This event begins on Monday, August 2 and ends on Wednesday, August 4. Visit us at any time during the show in booth 117, where we will showcase our Customer Intelligence solutions. SAS is proud to be a platinum sponsor of this major annual event, and we've invited two of our customers to share their experiences with driving marketing success using customer analytics:

1-800FLOWERS.com's Nachiket Desai will be in Room A103 at 2:15p.m. on Monday, Aug. 2,
Wyndham Worldwide's Sean Lowe will be in Room C202 at 11:45a.m. on Tuesday, Aug. 3

Registration is being handled by the show organizer and SAS customers can get a 25% discount off admission by using the "VIPSAS" code. REGISTER HERE

The SAS New York Tweetup & Web Analytics Monday Reception
This hospitality reception will be held in the Basement Lounge of the Houndstooth Pub on Monday, August 2 from 6:30p.m. until 8:30p.m. This will be a great opportunity to mingle with marketers attending the conference, as well as New York members of the Web Analytics Association, who we're working with to promote this event. We'll have a drawing for a FLIP video recorder, so come join us! REGISTER HERE.

CRM Evolution Executive Breakfast
This exclusive event is happening on Tuesday, August 3 from 7:45a.m. until 8:45a.m. at the Marriott Marquis Hotel at Times Square. SAS is the exclusive host of executive attendees at CRM Evolution and SAS executive-level customers in New York City for a closed-door session with Forrester Analyst Dave Frankland and CRM Media's Editorial Director David Myron. REGISTER HERE

The Strategic Customer Intelligence Dinner
We're coming across the river on Thursday, August 5 to host an executive dinner in Hackensack, NJ at Morton's Steakhouse in "The Boardroom," their private dining room. The evening will begin at 6:00p.m. with refreshments served during a newtworking reception, followed by remarks from Forrester Analyst Dave Frankland. Dinner will follow featuring Morton's world famous steak, chicken, salmon or vegetarian option. After dinner, Jeff Hoffman of Chubb & Sons insurance company will highlight his company's experiences in driving success using customer analytics. John Bastone of SAS will provide closing comments to wrap the evening at 9:00p.m. REGISTER HERE.

This event is also the big unveiling of the new creative theme we plan to use this year centered around apples. Our creative team peeled away at it and got to the core of the true essence of SAS Customer Intelligence. Come by booth 117 to check it out - also, we'll be handing out Green Apple Jolly Rancher candies.

If you can't be with us in the Big Apple, be with us virtually by following our official Twitter account: @SAS_CI, or follow the CRM Evolution list on Twitter: @CRMevolution/crme2010, or the CRM Evolution Group on LinkedIn.
7月 292010
 
As a marketer I spend all day, every day thinking about expressing how my company's products meet market/customer needs. I work hard on the prose, simplify the diagrams and especially focus on the underlying issues.

That said, I have a sneaking suspicion that I could be doing better and here's why - is my positioning really doing that good a job about reflecting what customers were telling us they needed?

The challenge comes from every customer expressing their specific needs, using their specific language and our ability to translate that into a description of a requirement that we can build / provide the capability for and doing that in a standardised, scalable fashion.

Think for a while about buying a computer - relatively few people will express what they need as a list of components or technical features; they tell us what they want to use the computer for, where/how it will be used and expect us to work out the details and propose the best solution that meets their needs. In addition to which, they are bombarded by conflicting advice / recommendations from a host of different sources - sound familiar?

Thankfully, customer analytics can help us understand those myriad of requirements, find the patterns in the otherwise unique conversations that help us address those requirements and in a way that plays to our strengths. It can even help us to understand how good a job we are doing in satisfying our customers.

The formula is simple enough - the customer will more fully understand the value when they know that your proposal has accurately addressed and reflected their expressed needs in the language with which they find comfortable. To paraphrase Clay Shirky at a recent conference - let's not spend our time trying to educate the customer - I would rather educate myself.