12月 082009
 
support.sas.com is loaded with detailed information about SAS 9.2. (I'll include a few references at the end of this post.) We heard you say that you were looking for more generic information about SAS 9.2; the 5,000 foot view if you will. I picked 5,000 feet because I believe that you can get the 10,000 foot view on www.sas.com and the 1,000 foot and lower view in the depths of support.sas.com.

In response to your comments, we rounded up people from all over SAS and collected their input. From that, we updated the SAS Software section. The landing page (to use a Weby term) has high level information with pointers into sections in which you might be interested. The right column offers some other resource links. The meat of this content appears behind the SAS 9.2 link found in the left navigation column.

The SAS 9.2 overview content is broken down into 5 major areas of interest, each listed in the left navigation column. The right column is again a resource list. Only this time, the resources are much more detailed. We broke them down into documentation (information for product users), installation (information for Installation Reps and IT departments) and migration (information for all three groups).

The most recent addition to this content is the 9.2 Maintenance information. The Release Announcement page will keep you up-to-date on the current maintenance release, define maintenance and point you to the self-service application for maintenance. [One thing to remember: Only a SAS installation Representative can request maintenance for your site.]

After all of that, we still heard you say that you couldn't find general SAS 9.2 information. We didn't get comments that indicated that the content was not what you were looking for. The comments we did receive indicated that you couldn't find the information. Especially the maintenance information. So. Staring the first week in December you can find a SAS Software highlight box on the right side of the support.sas.com Home page. The image to the right points out the SAS 9.2 box.

Resource list follows....
Continue reading "Finding 9.2 Content"
12月 052009
 
Contributed by Dwight Mouton, Customer Intelligence Product Management, SAS

I would consider myself a SAS User rather than an "IT" type, which adds two perspectives to my story.

First, there will be a little less detail than what my Sales Engineer friends might provide.
Second, this proves that a non-IT, though somewhat technical, SAS user can install and configure SAS 9.2 and Marketing Optimization 5.3.

Of course this install was on my laptop, so I'm not recommending that an end-user try this on a complex multi-server system. It's best to leave that to the IT professionals and the SAS Engineers so that us SAS Users can focus on our real jobs.

That said, this write-up is hopefully useful to those who install and configure SAS for a living.

Before you get any software, be it CDs or a downloaded "Software Depot" you can set up your Users and User Groups. For SAS 9.2 and Marketing Optimization 5.3 this is a critical step. There is a lot more control over security via user groups and if certain groups are not created, Marketing Optimization will not work. Luckily for me I work with the guru's and had it pounded into my head that I HAD to set these up. Since this was my own machine I simply put all the SAS users (myself plus the SAS created users) into a Marketing Optimization User Group. Of course you would want to get more complex and can set up several levels of user access.

Once everything was set up and I double checked my pre-installation check list, I started the installation of the application server. Since the guru's around me had the most experience with WebLogic, that's what I went with. This is also where I made my first error. WebLogic is installed via the CMD prompt and requires two commands. One to set the path for the install and the other to start the install. The first time through I forgot to set the path and it looked like everything was OK so I started the SAS Deployment Wizard. Of course I got through most of the server install but then it errored out. I went back and realized the error of my ways and started over.

The second time through WebLogic was more obvious in telling me something was wrong by giving me a popup window error. Going back to my checklist I realized that I hadn't fully installed the Java components. There are two files that need to be installed. One ends in JRE and the other in JDK. These are quick installs but to make sure everything was going to work I had to uninstall the SAS components that had already completed. Lesson learned, read your instructions very carefully. This is not like putting together a child’s bicycle on Christmas morning where you can pretty much wing-it based on experience (wheels on bottom, handle bar and seat on top).

In starting over I also realized that the scheduler, LSF Platform Suite, does not install as part of the Deployment Manager. To make sure it was in place ahead of going through the configuration I decided to install it now. It installed with no issues.

Continue reading "SAS 9.2 and Marketing Optimization 5.3 Deployment - A personal story"
12月 042009
 

(Note: All the followings are tested on Windows XP environment.)

0. Install Oracle Database 10g Express Edition

Fast (and free) to download, easy to deploy and simple to admin–for learning and testing purpose, Oracle Database 10g Express Edition (Oracle Database XE, a mini version of Oracle Database 10g Release 2) are strongly recommended:

0.1 Download it at its homepage(206MB);

0.2 Install it following default settings;

0.3 Unlock accounts for HR and SCOTT. In a windows DOS prompt, using the following scripts:

sqlplus sys/YourSysPassword as sysdba
alter user HR account unlock
alter user HR identified by YourHRPassword
alter user SCOTT account unlock
alter user SCOTT identified by TIGER
Or you can accomplish these tasks within the Oracle web application:
http://localhost:8081/apex

1. Connect Oracle using SAS libname engine

libname SCOTT oracle user=”SCOTT” password=”TIGER” path=’xe’ ;

Note: xe is the default path for Oracle Database XE.

2. Connect Oracle using SQL Procedure Pass-Through Facility

proc sql ;
connect to oracle as orcl
(user=”SCOTT” password=”TIGER” path=’xe’);

select *
from connection to orcl

(

SELECT …

) ;

disconnect from orcl;

quit;

Note: 1) In this approach, Oracle, instead of SAS, processes the SQL statement (i.e., you use the more powerful and flexible Oracle SQL syntax instead of SAS Proc SQL procedure for querying. For more, see SAS Doc).

2) Your Oracle SQL codes should be placed in the RED blocks, and end without a semicolon(;):

select *

from emp

3) Only basic Oracle SQL statements (select, create table, . . .)can pass through this facility.

3. Load SAS datasets to Oracle database

proc copy in=sashelp
out=scott;
select class;
run;

4. Misc: Some differences between Oracle SQL and SAS Proc SQL

4.1 Table aliases

Oracle: from employees a;

SAS: from employees a; or

from employees as a;

4.2 Column aliases

Oracle: don’t use single quotation marks(‘’).

select job_id as job, job_id job, job_id as “job” , job_id as ‘job’

SAS:

select job_id as job, job_id “job” , job_id ‘job’, job_id job

4.3 Literal Character Strings

Oracle: Date and character literal values must be enclosed within single quotation marks(‘’).

SAS: Use both single and double quotation marks.

4.4 Order the null value

Oracle: Null values are displayed last for ascending sequences and first for descending sequences.

SAS: Null values are displayed first for ascending sequences and last for descending sequences.

4.5 Nesting Group Functions

select max(avg(salary))
from employees
group by department_id

SAS log:

ERROR: Summary functions nested in this way are not supported.

Optional approach for SAS:

select max(avg) as max
from(
select avg(salary) as avg
from employees
group by department_id )

4.6 row number(_n_)

Oracle:

select rownum
from employees

SAS:

select monotonic()
from employees

Note: monotonic() is an undocumented function of SAS, seehttp://www2.sas.com/proceedings/sugi29/040-29.pdf

Applying Delta Method in SAS

 未分类  Applying Delta Method in SAS已关闭评论
12月 022009
 


Delta method is a method in statistics for deriving an approximate probability distribution for a function of an asymptotically normally distributed random variable from knowledge of the limiting variance of that random variable.
See "http://en.wikipedia.org/wiki/Delta_method".

In order to apply delta method, it is necessary to obtain the first order derivatives of the function with respect to the randome variables, which may require either symbolic or finite difference approximation computing power. SAS doesn’t explicitly have symbolic computing power nor does it ouptut intermediate result from finite approximation computations and we have to turn to PROC MODEL in ETS (or PROC NLIN in SAS/STAT) to obtain the symbolic derivatives of a given function F(.). PROC MODEL output three pieces of related information:

1. Internal machine code for computing each part of the derivatives, say g(b). This is the table called CodeList;
2. Summary symbolic result for first order derivatives of f(b), i.e. the complete symbolic expression of g(b); This is the table called ProgList;
3. Indicator of partial derivatives, this is the table called Derivatives

The code list is the human-readable code for internal computing and I use this to guide further computing on the derivatives. I also use the ProgList in the first step to obtain the expression of first order derivatives of f(b), i.e. expression of g(b).

Suppose you have a translog model, called F(p1, p2, p3; b1, b2, b3), where p1, p2, p3 are input prices and b1, b2, b3 are parameter vectors (parameters should be more in real translog model). The elasticity of substitution of input 1 and input2 is defined as [@F()/@p1]/[@F()/@p2]=g(b1, b2). *NOTE: @ indicates derivative operator;

In my first step, I use PROC MODEL on F(p1, p2, p3; b1, b2, b3) to estimate parameter vector {b1, b2, b3}; In this step, we obtain tables CodeList_0 and ProgList_0;

In the second step, I tweak PROC MODEL to obtain the partial derivetives: @F()/@p1, and @F()/@p2. Then, I read in ProgList table to form the explicit AUES expression g(p1, p2| b1, b2, b3)= [@F()/@p1]/[@F()/@p2]; This step generates tables CodeList, ProgList;


Since delta method based approach to estimate V(g) relies on the first order derivative of function g() w.r.t parameter vector (b1, b2, b3), we also need to obtain the partial derivatives of g() with respect to each of the parameters:, ie. @g()/@b1, @g()/@b2, @g()/@b3. Therefore, I have to call PROC MODEL once again to obtain these formula. Tables CodeList_g and ProgList_g were generated in this step.

Later, I parse the instructions in CodeList_g to calculate @g()/@b1, @g()/@b2, @g()/@b3 at every data point of the sample. Once we obtain these values, we can use delta method to get the asymptotical mean and variance of g() function

V[g()]=[@g()/@b1, @g()/@b2, @g()/@b3]’ %*% V(b1, b2, b3) %*% [@g()/@b1, @g()/@b2, @g()/@b3]


In summary, my program only ask user to input the original model for estimation as well as the two input prices for cross elasticity calculation, users don’t need to specify the derivative functions themselves. If we use IML, we have to specify explicit formula for vector [@g()/@b1, @g()/@b2, @g()/@b3], which may very tedious to calculate.

Below is sample code using data from Econometrics Analysis of W.Greene, 6th Ed.


data Green;
input Year   Cost    K        L      E       M        Pk      Pl      Pe     Pm;
datalines;
1947  182.373 0.05107 0.24727 0.04253 0.65913 1.00000 1.00000 1.00000 1.00000
1948  183.161 0.05817 0.27716 0.05127 0.61340 1.00270 1.15457 1.30258 1.05525
1949  186.533 0.04602 0.25911 0.05075 0.64411 0.74371 1.15584 1.19663 1.06625
1950  221.710 0.04991 0.24794 0.04606 0.65609 0.92497 1.23535 1.12442 1.12430
1951  255.945 0.05039 0.25487 0.04482 0.64992 1.04877 1.33784 1.25179 1.21694
1952  264.699 0.04916 0.26655 0.04460 0.63969 0.99744 1.37949 1.27919 1.19961
1953  291.160 0.04728 0.26832 0.04369 0.64071 1.00653 1.43458 1.27505 1.19044
1954  274.457 0.05635 0.27167 0.04787 0.62411 1.08757 1.45362 1.30356 1.20612
1955  308.908 0.05258 0.26465 0.04517 0.63760 1.10315 1.51120 1.34277 1.23835
1956  328.286 0.04604 0.26880 0.04576 0.63940 0.99606 1.58186 1.37154 1.29336
1957  338.633 0.05033 0.27184 0.04820 0.62962 1.06321 1.64641 1.38010 1.30703
1958  323.318 0.06015 0.27283 0.04836 0.61886 1.15619 1.67389 1.39338 1.32699
1959  358.435 0.06185 0.27303 0.04563 0.61948 1.30758 1.73430 1.36756 1.30774
1960  366.251 0.05788 0.27738 0.04585 0.61889 1.25413 1.78280 1.38025 1.33946
1961  366.162 0.05903 0.27839 0.04640 0.61617 1.26328 1.81977 1.37630 1.34319
1962  390.668 0.05578 0.28280 0.04530 0.61613 1.26525 1.88531 1.37689 1.34745
1963  412.188 0.05601 0.27968 0.04470 0.61962 1.32294 1.93379 1.34737 1.33143
1964  433.768 0.05452 0.28343 0.04392 0.61814 1.32798 2.00998 1.38969 1.35197
1965  474.969 0.05467 0.27996 0.04114 0.62423 1.40659 2.05539 1.38635 1.37542
1966  521.291 0.05460 0.28363 0.04014 0.62163 1.45100 2.13441 1.40102 1.41878
1967  540.941 0.05443 0.28646 0.04074 0.61837 1.38617 2.20616 1.39197 1.42428
1968  585.447 0.05758 0.28883 0.03971 0.61388 1.49901 2.33869 1.43388 1.43481
1969  630.450 0.05410 0.29031 0.03963 0.61597 1.44957 2.46412 1.46481 1.53356
1970  623.466 0.05255 0.29755 0.04348 0.60642 1.32464 2.60532 1.45907 1.54758
1971  658.235 0.04675 0.28905 0.04479 0.61940 1.20177 2.76025 1.64689 1.54978
;
run;

data translog;
     set green;
  l_C=log(cost);
     lpk = log(pk); 
     lpl = log(pl); 
     lpe = log(pe); 
  lpm = log(pm);
run;

/* Specify the prices used in elasticity measurements */
%let covar1=lpk;
%let covar2=lpl;
/* Specify the translog cost function. It can be a nonlinear function, too */
%let model = %bquote(l_c=b0 + bk*lpk + bl*lpl + be*lpe + bm*lpm +
            bkk*0.5*lpk*lpk + bll*0.5*lpl*lpl +
      bee*0.5*lpe*lpe + bmm*0.5*lpm*lpm +
      bkl*lpk*lpl + bke*lpk*lpe + bkm*lpk*lpm +
      ble*lpl*lpe + blm*lpl*lpm + bem*lpe*lpm + exp(lpk););

/*@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@*/
/*                          do not change code below                    */
/*@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@*/
/* Estimate original model */
ods select none ;
ods output FirstDerivatives=Derivatives_0;
ods output CodeList=CodeList_0;
ods output ProgList=ProgList_0;
ods output ParameterEstimates=Est;
ods output CovB=CovB;
proc model data=Translog ListCode ListDer  covb ;
     *endogenous K L E; 
     *parms  b0 bk bl be bm
               bkk bll bee bmm
               bkl bke bkm
               ble blm
               bem;     
      &model;
      *K = bk + bkk*lpk + bkl*lpl + bke*lpe + bkm*lpm; 
      *L = bl + bll*lpk + bkl*lpk + ble*lpe + blm*lpm;
   *E = be + bee*lpe + bke*lpk + ble*lpl + bem*lpm;
      fit   l_c / outsused = smatrix  ols  outcov  outest=beta ; 
   *estimate '&PRED.K/@bkk' exp(lpk*bkk);  
   run;
quit;     
ods select all;


proc sql noprint;
     select wrt into :para_names separated by ' '
  from   derivatives_0
  ;
quit;
%put ¶_names;

data covb;
        length _TYPE_ $ 5;
        set covb;  
  _TYPE_='PARMS';
  rename Parameter=_NAME_;
run;
 

proc transpose data=Est out=Est_t name=Parameter;
     var Estimate;
  id Parameter;
run;

%let elastic_covars=lpk lpl;
data _new;
     set Translog(drop=&elastic_covars); set Est_t;  
run;


/* Obtain elasticity measurements */
ods select none ;
options error=0;
ods output FirstDerivatives=Derivatives;
ods output CodeList=CodeList;
ods output ProgList=ProgList;
proc model data=_new ListCode ListDer  maxiter=0;
      *endogenous Sk Sl Se; 
      &model;
      fit  l_c /  ols;    
   run;
quit;     
ods select all;

/* obtain analytical derivatives w.r.t input prices */

data _null_;
     length _temp_ $ 1024;
     set ProgList;
  if indexc(source, '@')>0 then do;
     _temp_='';
  x1=indexc(source, '/')+2; x2=indexc(source, '=')-1;
  name=substr(source, x1, x2-x1); put name=;
     _temp_=cats(_temp_, compress(substr(source, find(source, '=')+1), '; '));  
  j=_n_+1;  
     do while( indexc(source, ';')=0);
           set ProgList point=j ;
           _temp_=cats(_temp_, compress(substr(source, find(source, '=')+1), '; '));  
     j+1;
  end;
  call symput(compress('foc_'||name), _temp_);
      end;
   put _temp_=;
run;

%put %bquote((&&foc_&covar1)/(&&foc_&covar2));


/* Obtain derivatives of elasticity measurements w.r.t parameters */
data _new;
     set Translog;
     if _n_>1 then stop; 
run;
ods select none;
ods output FirstDerivatives=Derivatives_g;
ods output CodeList=CodeList_g;
ods output ProgList=ProgList_g;
proc model data=_new ListCode ListDer  maxiter=0;     
      *parms  b0 bk bl be bm
                bkk bll bee bmm
                bkl bke bkm
                ble blm
                bem;     
   nom=&&foc_&covar1;
   denom=&&foc_&covar2;
      l_c= nom / denom;
      fit  l_c / ols;    
   run;
quit;     
ods select all;
/*
proc sql noprint;
     select name into :para_names separated by ' '
  from   sashelp.vcolumn
  where  varnum>=2 
         &  libname='WORK' 
         &  memname='EST_T'
  ;
quit;
%put ¶_names;
*/

data _null_;
     set CodeList_g end=eof;
  if _n_=1 then do;
     call execute('data compute;');
  call execute('if _n_=1 then set Est_T;');
  call execute('set translog;');
  call execute('retain ¶_names ;');
  /*call execute('lpk = log(pk); lpl = log(pl); lpe = log(pe);' );*/

  declare hash h();
  length word $ 32 wordtrans $ 32;
  h.defineKey('word');
  h.defineData('word', 'wordtrans');
  h.defineDone();
  call missing(word, wordtrans);

  declare hiter hi('h');
  end;
  /* check if it is a Operation item */
  where index(col1, 'Oper')>0;
  if index(col1, 'eeocf')>0 then col1=tranwrd(col1, '=', 'eeocf');
     j0=index(col3, '@'); j1=indexc(col3, ':')+1; j2=indexc(col3, '<')+2;     
  s1=trim(substr(col3, 1, j1-2));
  s2=compress(substr(col3, j1, j2-j1-2));
  s3=trim(substr(col3, j2));

  s2B=translate(s2, '___', '/@.');
  *put s2=  s2B=;
  if indexc(s2, '@./')>0 then do;
     rc=h.find(key:s2);
     if rc=0 then h.replace(key:s2, data:s2, data:s2B);
     else h.add(key:s2, data:s2, data:s2B);
  end;

  j=1;
  do until (scan(trim(col3), j, ' ')='');
     w=subpad(scan(col3, j, ' '), 1, 32);  
        if h.find(key:w)=0 then do;
     word2=wordtrans;
     s3=tranwrd(s3, compress(w), compress(wordtrans));
     *put s3=  wordtrans=;
     *put '^^^^^^^^^';
  end;

  j+1;
  end;

  if s1 in ('*', '-', '+', '/', '**') then  s3=translate(trim(left(s3)), s1, ' '); 
  else if s1 not in ('eeocf', '=') then s3=compress(cats(s1, '(', s3, ')'));
  rc=h.find(key: subpad(s2, 1, 32));
  if rc=0 then  s3=cats(compress(wordtrans), '=', s3);   
  else s3=cats(s2, '=', s3);

  s3=tranwrd(s3, 'ACTUAL.', ' ');
  *put s3;  
  *put '________________';
  if index(s3, 'ERROR')=0 | index(s3, 'RESID')=0 then
     call execute(s3||';');
  if eof then do;
     call execute ('drop ¶_names;');
        call execute ('run;');
  h.output(dataset:'_test');
  end;
run;



data  Mean(drop=_PRED:)   FirstDerivatives(drop=PRED:);
        set compute(keep=Year PRED: _PRED:);  
run;

proc sql noprint;
        select cats(a.name, ' = ', substr(a.name, index(a.name, '__')+2)) into:renamestmt separated by ' '
  from  sashelp.vcolumn as a
  join  _test as b
    on  compress(a.name)=compress(b.wordtrans)
  where  a.libname='WORK' and a.memname='FIRSTDERIVATIVES'
  ;
quit;
%put &renamestmt;

proc sort data=derivatives_0; by wrt; run;
proc sort data=derivatives_g; by wrt; run;
data _dev;
     merge derivatives_0(keep=wrt in=_0) 
        derivatives_g(keep=wrt in=_g)  end=eof
  ;
  by wrt;
  length _temp_ $ 1024;
  retain _temp_ ;
  diff=(_0-_g); 
  if _n_=1 then _temp_=';';
  if diff=1 then _temp_=cats(_temp_, ';', wrt, '=0');
  put _temp_=;
  if eof then call symput ('impute', _temp_);
run;
%put %bquote(&impute);


data FirstDerivatives;
        set FirstDerivatives;
  rename &renamestmt;
  &impute;
  ;
run;

proc score data=firstderivatives  score=covb  type=parms  out=_temp1;        
        var ¶_names;
run;

data _temp1;
     set _temp1(drop=¶_names);  
run;
proc sql noprint;
     select cats(name, '=', translate(name, '', '2')) into :renamestmt separated by ' '
  from   sashelp.vcolumn
  where  libname='WORK' 
      &  memname='_TEMP1'
   &  varnum>1
   ;
quit;
%put &renamestmt;

data _temp1_score;
     set _temp1;
  rename &renamestmt;
  _TYPE_='parms';
  _NAME_=compress('v'||year);
run;

proc score data=firstderivatives  score=_temp1_score  type=parms  out=v_G(keep=Year v:);        
     by year;
     var ¶_names;
run;

data v_G;
     set v_G;
  _var_=sum(of v:, 0);
  drop v:;
run;

proc datasets library=work nolist;
     delete _temp1 _temp1_score  _new _test;
quit;
 Posted by at 6:55 上午
12月 012009
 
Email - love it or hate it. Either way, and in whatever form (please read on), you can't avoid it. SAS CMO, Jim Davis blogged last week about getting 300-400 emails a day! And, while I agree that you can't avoid email, email as it is traditionally known is about to get a complete makeover.

Microsoft Outlook is just one form of email that I work with. I also get direct messages via social media sites like Twitter and Facebook. And, if I want, I can manage those messages and have them all appear in my In Box -- which then just adds to the volume of email for me to digest. Alternatively I could have all of my social media messages appear in Facebook. Or I could leave them all as is.

Which then means I have to deal with this rather unwieldy email list: Outlook, Facebook, MySpace, LinkedIn, Plaxo, Twitter, and much more!

If you thought email meant one source, forget it! It's email, Jim, but not as we know it. Soon enough, traditional email will be an ancient relic. Generation Y does not reply to messages via email. Social media site of preference, please! A succinct 140-character tweet probably sounds quite appealing to anyone who glosses over long emails.

But for an organization analyzing the masses of tweets that convey customer feedback, succinct tweets with limited grammatical accuracy will be quite a different can of worms. Which brings me right back to Text Data Quality.

Some 60-80% of any data mining effort involves data cleansing/data quality, imputing missing values, filtering outliers, etc. Text mining requires that same upfront effort, especially if you are performing unsupervised classification (grouping documents when there is no other target/ranking/outcome information available). Most of the social media text based information that we will want to analyze will require a rather significant data quality effort - the topic of a future blog.
11月 302009
 
> ##################################################
> #        EXTRACT ROWS FROM A DATA FRAME          #
> ##################################################
>
> data <- data.frame(scan(what = list(x = 0, y = 0)))
1: 1 1
2: 1 2
3: 1 3
4: 2 1
5: 2 2
6: 2 3
7:
Read 6 records
>
> # EXTRACT ROWS WITH LOGICAL SUBSCRIPTS
> rows1 <- data[data$x == 2 & data$y >= 2,]
> rows1
  x y
5 2 2
6 2 3
>
> # EXTRACT ROWS WITH ROW INDEX
> rows2 <- data[which(data$x == 2 & data$y >= 2),]
> rows2
  x y
5 2 2
6 2 3
>
> # EXTRACT ROWS WITH SUBSET() FUNCTION
> rows3 <- subset(data, x == 2 & y >= 2)
> rows3
  x y
5 2 2
6 2 3
>
> # EXTRACT ROWS WITH SQLDF() FUNCTION
> require(sqldf)
> rows4 <- sqldf("select * from data where x = 2 and y >= 2", row.names = TRUE)
> rows4
  x y
5 2 2
6 2 3
 Posted by at 9:59 上午
11月 292009
 
The Thanksgiving holiday weekend is screaming to a close. I can see the faint twinkle of Christmas lights on the horizon. But it is never too late for a thankful post. I read somewhere (Reader's Digest, I think) that optimism can bring happiness, health, and friends. The article also said that one great way to be more optimistic is to be thankful. Timely, don't you think?

I am always thankful for my family and friends, for their support and love, and for time spent with them in good health. But this is a support.sas.com blog, so let me move on to topics relative to those of you reading. Thanks for reading, by the way!

I am thankful for a job that enables me to interact with so many different and interesting people. Do you remember back in college when you were picking a career and you would say "I want to make a difference"? Somehow I thought that was limited to doctors and firemen and the Peace Corp. I'll chalk that up to the short-sightedness of youth. I am none of those things, but I do work on support.sas.com, which in turn, helps SAS users world-wide be more efficient at their jobs.

We see the Customer Support Web site as a 24-hour self-serve helpdesk. The goal of my team is to make support.sas.com useful for you so that you can do the things that you do. Things that include: research cures for disease, seek out financial fraud, help protect endangered animals, and allow us to send flowers to the family and friends that me miss during the year. Yep, you all help make the world a better place and that is really cool. I'm thankful for all that you do.

Finally, I'm thankful for each of you who have taken the time to send a comment about how the Web site is or is not working for you. All of those comments and conversations add up to change in the way support.sas.com works.

Thank you for reading this blog, for your comments, and for using SAS!
11月 262009
 


Seth Grimes posted more information about Text Data Quality today. I really think that text data quality issues will only become more pronounced as Gen Y users move more heavily into the social media space where spelling and grammar accuracy are not a great concern to these digital producers.

The content being actively used for text analysis is, more often than not, in need of cleansing for misspellings, synonyms, acronyms, profanity and more, as mentioned in my last blog. I'd hazard a guess at over 85% of text content needing cleansing prior to analysis. Even published articles contain the occasional spellling errors!

I see huge potential for using "clipped/shortened text" language dictionaries as part of our underlying NLP capabilities. I look forward to reading about Seth's continued research in this area!
11月 252009
 
I haven't reminded you about YouTube in a while. After reading a recent post in the Open Mic blog, I thought it might be time to nudge you toward SAS resources on YouTube. It is a great way to take a break from your normal workday while still learning about SAS and analytics.
There are some great videos out there. Here are a couple favorites from my last YouTube visit:

Visit the SAS channel on YouTube. If you have a favorite SAS-related YouTube video, share it with others. (OK now, remember that I asked for a video related to SAS!)
11月 252009
 
原文載點:User-Written DATA Step Functions

這是一個 SAS V9.2 版新釋出的功能,類似 macro,可自行定義 SAS 沒有的函式(function),並可在不用重複呼叫的情況之下,自由使用於每個 data step 裡面。這個自行定義函式的功能,是由 PROC FCMP 來完成。

直接來看一個範例:
proc fcmp outlib=sasuser.funcs.trial;
function study_day(intervention_date, event_date);
if event_date < intervention_date then
return(event_date – intervention_date);
else
return(event_date – intervention_date + 1);
endsub;
run;

options cmplib=sasuser.funcs;
data _null_;
start = '15Feb2006'd;
today = '27Mar2006'd;
sd = study_day(start, today);
put sd=;
run;

這個範例可分為兩部分。第一部份使用 PROC FCMP 去製作一個名叫 study_day 的函式,而裡頭包含了兩個參數,一個叫 intervention _date,另一個叫 event_date。裡面的內容包含了一個 if-else- 的條件判斷式,並依照不同的條件傳回 event_date 減去 intervention_date 的天數或 event_date 減去 intervention_date 加一的天數。結束後使用 endsub 把 function 裡面的內容包起來,並將結果存到 sasuser 底下的 funcs 這個資料裡面,並命名為 trials。

等到要使用時,必須用 options cmplib 把 sasuser.funcs 叫進來,之後便可以無限次的在任何 data step 裡面使用 study_day 這個函式。即便重新開啟 SAS,也不用再跑一次 PROC FCMP 程序,只要執行 options cmplib=sasuser.funcs 這一行後,便可以直接使用 study_day 這個函式了。

如果想要製作 call routine 的函式,則需要在 PROC FCMP 裡面使用 subroutine statement 來製作。範例如下:
proc fcmp outlib=sasuser.funcs.math;
subroutine subA();
x = 5;
call subB();
put 'In subA:' x=;
endsub;
subroutine subB();
x = 'subB';
put 'In subB:' x=;
endsub;
run;

options cmplib=sasuser.funcs;
data _null_;
x = 99;
call subA();
put 'In DATA step: ' x=;
run;

在這個範例中,PROC FCMP 程序先用 subroutine 製作一個叫做 subA() 的 call routine,其功能是會定義 x=5,並且呼叫另一個 call routine 名為 subB(),最後會在 log 視窗列印出「In subA: x=」的字樣。但 subB() 這個 call routine 也是自創的,所以需要用另一個 subroutine statement 把他的功能寫好。完成後兩個 subroutine statement 都要用 endsub 把 subA() 和 subB() 打包好,如此一來,這個自創的 subA() 就可以在第二部份的 data step 裡面使用。

原文裡面還有一些更進階的寫法,但我覺得上面兩個範例已經足夠一般使用者來使用,所以就不花篇幅寫進階的教學了,有興趣的人可以自行下載原文來研究。

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author:
Jason Secosky
SAS Institute Inc.
SAS Campus Drive
Cary, NC 27513
919-677-8000
Jason.Secosky@sas.com