7月 072010
 
Here's something you don't expect to hear from a banking executive: "The best thing that happened is the financial crisis."

Of course, Tonny Rabjerg is not your standard banking executive. He's the Vice President of CRM Systems at Danske Bank. "I know the financial crisis is not good for the bank, but for me it is good, because there has never been more focus on customers than before. Typically, banks think risk and credit are more important, but without the customer, risk and credit don't matter," says Rabjerg

So, how is Rabjerg taking advantage of this new focus on the customer? He's leading a shift in his organization from customer relationship management (CRM) to personal customer management (PCM), and committing five to seven years to the project. The difference between CRM and PCM involves one-to-me marketing instead of one-to-one marketing and personal product presentations instead of campaign-based sales. "It's about matching the customer's requirements before he needs it," says Rabjert. "I want to make sure we makes it easy for him to take out a car loan before going to car dealer."

Philippe Wallez, General Manager of Marketing, is leading similar customer-centric programs at ING Belgium. His full-scale direct-marketing project, which started in 2007, has transformed the way the bank communicates with its 2.7 million customers. Projects include targeted marketing and street advertising campaigns that put the brand's orange logos and themes directly on the backs of consumers. "If we are forced to communicate online, we will be forced to simplify," says Wallez. "Even on banking social networks, customers don't talk about banking. They talking about their homes and their cars and their financial concerns."

In 2006, ING Belgium conducted one direct marketing campaign per week. Last year, the marketing team conducted at least ten campaigns per day. How did they do it? They hired business analysts, campaign analysts, marketers, digital marketers and direct mailers. They developed a new campaign process, a new data platform, brought in new tools and established new customer data ownership policies.

The bank now has a global client contact strategy, and the marketing department reports results directly to the board every week.

The benefits of ING's strategy include:
  • Fully automated service.
  • Simple sales migrate to direct channels.
  • More time for advice and sales in branch network.
  • Increase advice efficiency through direct marketing generated leads.
For such a wide-scale project, Wallez recommends strategy above all else. "Whatever strategy you use, you have to have a strategy. Consistently focus on strategy and free up resources," he says. "It's not easy but it's possible."

[Cross-posted from the sascom voices blog]
7月 072010
 
I use this blog to talk about all of the great information that is available to you on support.sas.com. I point out features that you may have missed (like the RSS feeds or Discussion Forums); I answer questions that you send via our feedback form (see the Q&A category); and I give you hints for searching and navigating.

While many of the SAS resources are available to you on support.sas.com, locating and understanding them can be a taxing effort. If you like video, sit back, relax, and watch these two new videos from SAS. Each video was produced to help you get more from your SAS investment. One provides an overview of the resources that are available for our customers, and the second gives instructions on how to contact SAS Technical Support.

Discover SAS Customer Resources
The customer resources video includes a message from Stacy Hobson about customer loyalty as well as information about the SAS Users Groups, Publications, Training, Contracts, R&D, Technical Support, and support.sas.com. You can watch this video from the Community page on support.sas.com site. The video is also available on the SAS Software YouTube channel.

Contacting SAS Technical Support
Did you know that SAS customers have unlimited access to SAS Technical Support? Watch the video to find out when and how to contact Technical Support. The video is available on the Support page of support.sas.com.
7月 062010
 
Do the digits of Pi appear in a random order? If so, the trillions of digits of Pi calculated can serve as a useful random number generator. This post was inspired by this entry on Matt Asher's blog.

Generating pseudo-random numbers is a key piece of much of modern statistical practice, whether for Markov chain Monte Carlo applications or simpler simulations of what raw data would look like under given circumstances.

Generating sufficiently random pseudo-random numbers is not trivial and many methods exist for doing so. Unfortunately, there's no way to prove a series of pseudo-random numbers is indistinguishable from a truly random series of numbers. Instead, there are simply what amount to ad hoc tests, one of which might be failed by insufficiently random series of numbers.

The first 10 million digits of Pi can be downloaded from here. Here, we explore couple of simple ad hoc checks of randomness. We'll try something a little trickier in the next entry.


SAS

We start by reading in the data. The file contains one logical line with a length of 10,000,000. We tell SAS to read a digit-long variable and hold the place in the line. For later use, we'll also create a variable with the order of each digit, using the _n_ implied variable (section 1.4.15).


data test;
infile "c:\ken\pi-10million.txt" lrecl=10000000;
input digit 1. @@;
order = _n_;
run;


We can do a simple one-way chi-square test for equal probability of each digit in proc freq.


proc freq data = test;
tables digit/ chisq;
run;

Chi-Square 2.7838
DF 9
Pr > ChiSq 0.9723
Sample Size = 10000000


We didn't display the counts for each digit, but none was more than 0.11% away from the expected 1,000,000 occurrences.

Another simple check would be to assess autoregression. We can do this in proc autoreg. The dw=2 option calculates the Durbin-Watson statistic for adjacent and alternating residuals. We limit the observations to 1,000,000 digits for compatibility with R.


proc autoreg data=test (obs = 1000000);
model digit = / dw=2 dwprob;
run;
Durbin-Watson Statistics

Order DW Pr < DW Pr > DW
1 2.0028 0.9175 0.0825
2 1.9996 0.4130 0.5870

We might want to replicate this set of tests for series of 4 digits instead. To do this, we just tell the data step to read the line line in 4-digit chunks.

data test4;
infile "c:\ken\pi-10million.txt" lrecl=10000000;
input digit4 4. @@;
order = _n_;
run;

proc freq data = test4;
tables digit4/ chisq;
run;
Chi-Square 9882.9520
DF 9999
Pr > ChiSq 0.7936

Sample Size = 2500000

proc autoreg data=test4 (obs = 1000000);
model digit = / dw=3 dwprob;
run;
Durbin-Watson Statistics

Order DW Pr < DW Pr > DW
1 2.0014 0.7527 0.2473
2 1.9976 0.1181 0.8819
3 2.0007 0.6397 0.3603

So far, we see no evidence of a lack of randomness.

R

In R, we use the readLines() function to create a 10,000,000-digit scalar object. In the following line we split the digits using the strsplit() function (as in section 6.4.1). This results in a list object, to which the as.numeric() function (which forces the digit characters to be read as numeric, section 1.4.2) cannot be applied. The unlist() function converts the list into a vector first, so that strsplit() will work. Then the chisq.test() function performs the one-way chi-squared test.


mypi = readLines("c:/ken/pi-10million.txt", warn=FALSE)
piby1 = as.numeric(unlist(strsplit(mypi,"")))
chisq.test(table(piby1), p=rep(0.1, 10))

This generates the following output:

Chi-squared test for given probabilities

data: table(piby1)
X-squared = 2.7838, df = 9, p-value = 0.9723


Alternatively, it's trivial to write a function to automatically test for equal probabilities of all categories.


onewaychi = function(datavector){
datatable = table(datavector)
expect = rep(length(datavector)/length(datatable),length(datatable))
chi2 = sum(((datatable - expect)^2)/expect)
p = 1- pchisq(chi2,length(datatable)-1)
return(p)
}

> onewaychi(piby1)
[1] 0.972252



The Durbin-Watson test can be generated by the dwtest function, from the lmtest package. Using all 10,000,000 digits causes an error, so we use only the first 1,000,000.


> library(lmtest)
> dwtest(lm(piby1[1:1000000] ~ 1))

Durbin-Watson test

data: lm(piby1[1:1e+06] ~ 1)
DW = 2.0028, p-value = 0.9176
alternative hypothesis: true autocorrelation is greater than 0


To examine the digits in groups of 4, we read the digit vector as a matrix with 4 columns, then multiply each digit and add the columns together. Alternatively, we could use the paste() function (section 1.4.5) to glue the digits together as a character string, the use the as.numeric() to convert back to numbers.


> pimat = matrix(piby1, ncol = 4,byrow=TRUE)
> head(pimat)
[,1] [,2] [,3] [,4]
[1,] 1 4 1 5
[2,] 9 2 6 5
[3,] 3 5 8 9
[4,] 7 9 3 2
[5,] 3 8 4 6
[6,] 2 6 4 3

> piby4 = pimat[,1] * 1000 + pimat[,2] * 100 +
+ pimat[,3] * 10 + pimat[,4]
> head(piby4)
[1] 1415 9265 3589 7932 3846 2643

# alternate approach
# piby4_v2 = as.numeric(paste(pimat[,1], pimat[,2],
# pimat[,3], pimat[,4], sep=""))

> onewaychi(piby4)
[1] 0.7936358

> dwtest(lm(piby4[1:1000000] ~ 1))
Durbin-Watson test

data: lm(piby4[1:1e+06] ~ 1)
DW = 2.0014, p-value = 0.753
alternative hypothesis: true autocorrelation is greater than 0
7月 022010
 
I'm happy to report that I have achieved my SAS Certified Base Programmer credential. w00t!. I took my first SAS programming course on March 10th and passed my exam on June 18th (so I guess a more accurate title would say 3 months and 8 days!).

I thought I would share a couple notes from my experience. But first, I should point out that there are other great perspectives on certification preparation techniques. First, there is 10 year SAS programming vet Gongwei Chen's SAS Global Forum paper describing his 4 months of preparation for Base and Advanced SAS programming credentials. There is also the PROC CERTIFY project from SAS Publishing employees Stacey Hamilton and Christine Kjellberg, two self-affirmed non-techies tackling certification with the assistance of SAS Publishing books such as the Certification Prep Guide.

So, what's my background? I have a degree in engineering. I'm a veteran of the telecom industry with more than a dozen years spent in network engineering and technical education. I've achieved data networking certification in the form of Cisco's CCNA. I consider myself a techie, dabbling in computers, web, and database technologies. I'm not a programmer, but I have programmed, learning FORTRAN in college and then C and VBA afterward. So in terms of technical/SAS background, I guess I'm somewhere on the continuum between Gongwei on one side and Stacey and Christine on the other.

Why did I seek SAS certification? Two reasons. First, I'm new to SAS and, being a bit of a techie, really wanted to learn SAS programming. Second, I work on the certification team here at SAS and wanted to experience SAS certification first hand. Ethically, I would have to do this quickly so that I achieved my certification before I did any work on the Base Programming exam. Wouldn't really be fair to see the exam before taking the exam, now would it?

How did I prepare? Although I am a long-time fan of self-paced learning, I thought I would do something different this time and give classroom training a try. Specifically, I took the Programming 1 and Programming 2 courses, did a bunch of studying, wrote a bunch of programs, and then used the Certification e-Practice exam to ensure that I was ready to sit for the exam. These component courses may also be available to you at a discount via a Best Value Bundle, depending on your location.

Who do I think can benefit from my perspective? Maybe you would like to make a career change and take advantage of the 1000+ SAS related job postings on the major job boards. Maybe you already work with SAS code and would like to become more efficient with your coding time and gain some recognition of your SAS programming skills. Or maybe you just got your first SAS programming job and need to ramp-up quickly.

In my next post, I'll share more specifics about how I studied and prepared for the exam. I hope you will find my perspective helpful.
7月 012010
 
We’re pleased to offer another segment of “The Nuts & Bolts of Social Media,” this time with my esteemed colleague, Justin Huntsman. In this short video, Deb talks to Justin about how he’s integrating social media into his campaigns. During the discussion, Justin provides some great ideas about how to ease social media “onto your docket” and balance traditional marketing with social media marketing.

Some of the most interesting points come from the idea that they key to balancing traditional and social marketing activities is to establish your goals before you start. Related points include:

  • As you engage in both traditional and social marketing, if you begin with a goal, you’ll find your efforts align themselves automatically.
  • If you don’t begin with a goal, you likely will be making tradeoffs that in retrospect will end up costing you more than you imagined.

Integrating and balancing traditional and social marketing efforts is critical for marketers to be successful today. Justin offers an effective way to approach it by getting yourself to stop thinking of the two worlds as independent responsibilities, and also to think of social media as a means and not an end.

Click on the screen below to tune in – it’s a short interview packed with some good insights.

6月 292010
 
Contributed by Marie Dexter, SAS
Are you a customer who is upgrading from SAS 8.2 to SAS 9.2? Or are you a customer who is currently running SAS 9.2 and you want to apply the third maintenance release for SAS 9.2? How can you learn about the new features and enhancements in the SAS products that you license? Is all of this information documented in one place?

The What's New in SAS 9.2 document includes What’s New topics for products that shipped with or depend on SAS 9.2. (Note one exception: If your product documentation is secured, then it is not included in What’s New in SAS 9.2. For more information about secure (requires login) product documentation, contact your SAS consultant.)

Each What’s New topic is cumulative for the SAS 9.2 release. For example, the “What’s New in SAS 9.2 Procedures” topic contains features and enhancements for SAS 9.2, the second maintenance release for SAS 9.2, and the third maintenance release for SAS 9.2. Features and enhancements that were part of a maintenance release are clearly labeled. For more information about maintenance releases, see support.sas.com/software/maintenance.

Some SAS products use their own product release numbers. For these products, What’s New topics for all product releases that shipped on top of SAS 9.2 are included in What’s New in SAS 9.2. For example, this document contains the What’s New topics for the following product releases:

  • SAS Enterprise Miner 6.1 and the SAS Enterprise Miner 6.1 maintenance release

  • the 9.2 and 9.22 releases of SAS/ETS, SAS/OR, and SAS/STAT

  • SAS Scoring Accelerator 1.6 for Teradata and SAS Scoring Accelerator 1.7 for Teradata

All of the What’s New topics in What’s New in SAS 9.2 are also available in the product documentation. For example, the “What’s New for SAS Language Reference” topic is available in SAS Language Reference: Dictionary. For more information about a specific new feature or enhancement, see the product documentation in the following locations:
  • the product documentation pages at support.sas.com/documentation

  • SAS OnlineDoc

  • the Help that is available within a SAS product

  • Please note your site might not license all of the products that are listed in What’s New in SAS 9.2. Therefore, you might not be able to access the Help for all SAS products.

    What’s New topics are updated whenever there is an update or new release of a SAS product, so you should review the What’s New topics for your products whenever you receive a product update.

    To access the latest version of What’s New in SAS 9.2, see support.sas.com/documentation/whatsnew. This product documentation page also provides links to the What’s New documentation for SAS 8.2 and SAS®9.
    6月 292010
     
    Hello, readers new and old!

    We started adding examples a year ago, in advance of the book's publication.

    To mark the occasion, we're closing chapter 7 and starting chapter 8 next week. We've crafted a listing of all entries from the first year and made this available here.

    For those wanting to keep score at home, the five entries most often viewed were:


    Example 7.11: Plot an empirical cumulative distribution function from scratch

    Example 7.8: Plot two empirical cumulative density functions using available tools

    Example 7.2: Simulate data from a logistic regression

    Example 7.34: Propensity scores and causal inference from observational studies

    Example 7.30: Simulate censored survival data


    Thanks for reading! We love comments, questions, and suggestions for new examples.

    Ken and Nick
    6月 282010
     

    The raw data: here

    FILENAME data URL "http://research.stlouisfed.org/fred2/data/UNRATE.txt" DEBUG LRECL=100;

    data one;
    infile data;
    input @1 year 4. @6 month 2. @9 day 2. uer;
    DATE = MDY(MONTH,DAY,YEAR);
    FORMAT DATE monyy7.;
    drop year month day;
    if date=. then delete;
    run;


    proc forecast data=one out=uerp1 out1step
    lead=12 interval=month;
    id date;
    var uer;
    run;

    proc arima data=one;
    i var=uer;
    e p=1 q=12;
    f lead=12 interval=month id=date out=uerp2;
    run;
    quit;
    data two;
    set uerp1(keep=date uer
    rename=(uer=pforecast));
    set uerp2(keep=date uer forecast
    rename=(uer=actual forecast=parima));
    run;
    title 'Unemployment rate';
    ods listing close;
    ods html file="prediction.html" style=gears image_dpi=300;

    ods graphics /reset imagename='Table' imagefmt=gif;
    proc sgplot data=two;
    series x= date y=actual;
    series x= date y=pforecast / legendlabel='predication by proc forecast';
    series x= date y=parima / legendlabel='predication by proc arima ';
    refline '01mar2010'd '01mar2011'd /axis=x transparency=0.5 label=('Mar2010' 'Mar2011');
    yaxis values=(3 to 12 by 1) label='Percentage';
    where date>'01mar1990'd ;
    run;

    ods html close;
    ods listing;



    6月 262010
     
    In some data mining applications, matrix norm has to be calculated, for instance [1]. You can find a detailed explanation of Matrix Norm on Wiki @ Here

    Instead of user written routine in DATA STEP, we can obtain "Entrywise" norm via PROC FASTCLUS efficiently and accurately.

    
    data matrix;
         input X1-X5;
    datalines;
    1 2 4 5 6
    7 8 9 0 1
    2 3 4 5 6
    3 4 5 6 7
    7 8 9 0 2
    2 4 6 8 0
    ;
    run;
    
    data seed;
         input X1-X5;
    datalines;
    0 0 0 0 0
    ;
    run;
    
    options nosource;
    proc export data=matrix  outfile='c:\matrix.csv'  dbms=csv replace; run;
    options source;
    
    proc fastclus data=matrix  seed=seed      out=norm(keep=DISTANCE)
                  maxiter=0    maxclusters=1  noprint  ;
         var x1-x5;
    run;
    
    /* 
    In output file NORM, variable DISTANCE is the square root of Frobenius norm. If LEAST=P option is specified, then p-norm is calculated. In PROC FASTCLUS, you can specify p in the range of  [1, \inf].
    
    Now what you got is vector norm for each row, taking the sum of squares of DISTANCE, you obtain the Frobenius norm of the data matrix, which can be easily obtained through PROC MEANS on a data view: 
    */
    data normv/ view=normv;
         set norm(keep=DISTANCE);
         DISTANCE2=DISTANCE**2;
         drop DISTANCE;
    run;
    proc means data=normv noprint;
         var DISTANCE2;
         output  out=matrixnorm  sum(DISTANCE2)=Frobenius_sqr;
    run;
    

    You can use the following R code to verify the results;
    
    mat <- read.csv('c:/matrix.csv', header=T)
    #verify vector norm
    vnorm <- apply(mat, 1, function(x){sqrt(sum(x^2))});
    #verify norm of the matrix
    x<-as.matrix(mat)
    sqrt(sum(diag(t(x)%*%x)))
    

    PS:
    1. Of course, above process is designed for implementing the randomized SVD in [1]. If only the matrix Frobenius norm is of interests, you can also use the following code snippet:

    
    data matrixv/view=matrixv;
         set matrix;
         array _x{*}  x1-x5;
         array _y{*}  y1-y5;
         do j=1 to dim(_x);  _y[j]=_x[j]**2; end;
         keep y1-y5;
    run;
    
    proc means data=matrixv  noprint;
         var y1-y5;
         output  out=_var(drop=_TYPE_  _FREQ_)   sum()=/autoname;
    run;
    
    data _null_;
         set _var;  
         norm=sqrt(sum(of _numeric_));
         put norm=;
    run;
    /* --LOG WRITES:
    norm=28.635642127
    NOTE: There were 1 observations read from the data set WORK._VAR.
    */
    
    

    2. Using its built-in computing engine for Eucleadian Distance, PROC FASTCLUS is also a powerful tool to search for the data point in main table that is CLOEST to the a record in lookup table. This technique is shown Here and [2].


    Reference:
    [1], P. Drineas and M. W. Mahoney, "Randomized Algorithms for Matrices and Massive Data Sets", Proc. of the 32nd Annual Conference on Very Large Data Bases (VLDB), p. 1269, 2006.

    [2], Dorfman, Paul M.; Vyverman, Koen; Dorfman, Victor P., "Black Belt Hashigana", Proc. of the 2010 SAS Global Forum, Seattle, WA, 2010
     Posted by at 9:48 上午

    "Entrywise" Norm calculation using PROC FASTCLUS

     Data Mining, PROC FASTCLUS, SVD  "Entrywise" Norm calculation using PROC FASTCLUS已关闭评论
    6月 262010
     


    In some data mining applications, matrix norm has to be calculated, for instance [1]. You can find a detailed explanation of Matrix Norm on Wiki @ Here

    Instead of user written routine in DATA STEP, we can obtain "Entrywise" norm via PROC FASTCLUS efficiently and accurately.

    
    data matrix;
         input X1-X5;
    datalines;
    1 2 4 5 6
    7 8 9 0 1
    2 3 4 5 6
    3 4 5 6 7
    7 8 9 0 2
    2 4 6 8 0
    ;
    run;
    
    data seed;
         input X1-X5;
    datalines;
    0 0 0 0 0
    ;
    run;
    
    options nosource;
    proc export data=matrix  outfile='c:\matrix.csv'  dbms=csv replace; run;
    options source;
    
    proc fastclus data=matrix  seed=seed      out=norm(keep=DISTANCE)
                  maxiter=0    maxclusters=1  noprint  ;
         var x1-x5;
    run;
    
    /* 
    In output file NORM, variable DISTANCE is the square root of Frobenius norm. If LEAST=P option is specified, then p-norm is calculated. In PROC FASTCLUS, you can specify p in the range of  [1, \inf].
    
    Now what you got is vector norm for each row, taking the sum of squares of DISTANCE, you obtain the Frobenius norm of the data matrix, which can be easily obtained through PROC MEANS on a data view: 
    */
    data normv/ view=normv;
         set norm(keep=DISTANCE);
         DISTANCE2=DISTANCE**2;
         drop DISTANCE;
    run;
    proc means data=normv noprint;
         var DISTANCE2;
         output  out=matrixnorm  sum(DISTANCE2)=Frobenius_sqr;
    run;
    

    You can use the following R code to verify the results;
    
    mat <- read.csv('c:/matrix.csv', header=T)
    #verify vector norm
    vnorm <- apply(mat, 1, function(x){sqrt(sum(x^2))});
    #verify norm of the matrix
    x<-as.matrix(mat)
    sqrt(sum(diag(t(x)%*%x)))
    

    PS:
    1. Of course, above process is designed for implementing the randomized SVD in [1]. If only the matrix Frobenius norm is of interests, you can also use the following code snippet:

    
    data matrixv/view=matrixv;
         set matrix;
         array _x{*}  x1-x5;
         array _y{*}  y1-y5;
         do j=1 to dim(_x);  _y[j]=_x[j]**2; end;
         keep y1-y5;
    run;
    
    proc means data=matrixv  noprint;
         var y1-y5;
         output  out=_var(drop=_TYPE_  _FREQ_)   sum()=/autoname;
    run;
    
    data _null_;
         set _var;  
         norm=sqrt(sum(of _numeric_));
         put norm=;
    run;
    /* --LOG WRITES:
    norm=28.635642127
    NOTE: There were 1 observations read from the data set WORK._VAR.
    */
    
    

    2. Using its built-in computing engine for Eucleadian Distance, PROC FASTCLUS is also a powerful tool to search for the data point in main table that is CLOEST to the a record in lookup table. This technique is shown Here and [2].


    Reference:
    [1], P. Drineas and M. W. Mahoney, "Randomized Algorithms for Matrices and Massive Data Sets", Proc. of the 32nd Annual Conference on Very Large Data Bases (VLDB), p. 1269, 2006.

    [2], Dorfman, Paul M.; Vyverman, Koen; Dorfman, Victor P., "Black Belt Hashigana", Proc. of the 2010 SAS Global Forum, Seattle, WA, 2010
     Posted by at 9:48 上午