10月 122009
 
More questions answered:
Q: I'm looking for a simple install guide for 9.2
A: I couldn't answer the question about why SAS 9.2 has different installation documentation, so I enlisted the help of Fred Perry who is a major contributor of content to the Install Center.

How do I find my usual installation documentation?
In SAS 9.2, we introduced the SAS Deployment Wizard to replace the SAS Software Navigator. The SAS Deployment Wizard is a far more robust tool to drive the deployment of your software. Here are some improvements you may notice:

  • The SAS Deployment Wizard allows a quiet or silent installation which was difficult to perform with the SAS Software Navigator.


  • The SAS Deployment Wizard prompts you for installation and configuration information at the beginning of your deployment, so you provide the information only once and then allow installation to continue unattended.


  • The SAS Deployment Wizard also replaces the SAS Configuration Wizard so you have the option to install SAS and run the configuration at the same time or perform them separately.

Instead of one or two generic paths through the deployment tool, the SAS Deployment Wizard supports a path that is tailored for your individual software order. With such a big change in the way SAS software is deployed, the documentation path for SAS 9.2 has also been updated.

Your installation experience begins with a Software Order E-mail that each customer receives. This customized starting point explains exactly what you need to do to install your software. It also includes a list of the other documents you might need and how to find those documents.

  • For Windows and UNIX users, your installation path continues with SAS QuickStart Guides. Then you will move through the SAS Deployment Wizard and find targeted explanations in its Help dialogs.


  • For z/OS users, your Software Order E-mail directs you to a set of installation instructions specific to your type of deployment.

For deployment on any of these hosts, you will finish with the configuration information available for the products in your software order.

Install Center continues to offer valuable documentation, the same way it did in releases of SAS software prior to SAS 9.2. Your documentation path describes when you need to visit Install Center to retrieve that documentation.

Now that the deployment path is more personalized, you may notice fewer documents in the "Installation Instructions" section of Install Center. That change is part of our effort to produce a less generic, more efficient deployment experience for you. Users have told us that, from the Software Order E-mail to the completion of any configuration steps, deployment of SAS software is faster than ever before. We hope these improvements provide an easier deployment experience for you as well.
10月 102009
 
Graduate and undergraduate students are eligible to apply for the SAS Student Ambassador Program, which covers the cost of travel expenses and registration fees paid to attend SAS Global Forum 2010.
What’s the “catch?” You must complete an application and submit a presentation abstract and working draft by midnight October 26, 2009, Eastern Standard Time.
What are the benefits? You’ll have the opportunity to present before an international audience, network with SAS users from every industry and sector . . . and have fun in Seattle.
10月 102009
 
We have recently added feedback forms and short surveys on some pages on support.sas.com. We don't ask for your e-mail address or require you to log in to the site. That means that all of the comments are anonymous. We think the ease of commenting will increase the amount of good feedback we receive. It seems to be working.

I have found a downside. Some of the comments express frustration about content that can't be found. If only I could contact the person and offer some help. I can't do that, but a girl with a blog can provide answers as blog posts. The comments are usually short, so I have to guess at the detailed meaning. May I get a few right!

Here's my first attempt.

Comment: I just wanted some common sample programs for statistical analysis.
Response: When looking for statistical analysis samples, bookmark these links:

Tweak PROC FASTCLUS for 1-Nearest Neighbor / Closest Match

 Array, Data Manipulation, Hash Object, PROC FASTCLUS  Tweak PROC FASTCLUS for 1-Nearest Neighbor / Closest Match已关闭评论
9月 242009
 


In most table lookup tasks, we are doing EXACT matching. However, sometimes we are looking for closest match in the lookup table. By 'closest', we mean smallest Eucleadian distance:


||X-Y||2

Typically we have to manually code the search function in a DATA STEP, either using ARRAY or using HASH OBJ.

But if we are only care about the 1 closest point in lookup table, we can also tweak PROC FASTCLUS for a simple yet fast implementation. This is the same as 1-Nearest Neighbor calculation.

Here is an example with 2-dimension data and Euclidean Distance:

data fix;
input x y;
datalines;
1 3
2 4
3 5
8 0.2
15 1
;
run;
data have;
input x y;
datalines;
1.2 6
0.3 4
10 1.2
7 1
2.9 4
;
run;
data fix;
   set fix;
   CLUSTER=_n_;
run;

%let dsid=%sysfunc(open(fix));
%let ntotal=%sysfunc(attrn(&dsid, NOBS));
%let dsid=%sysfunc(close(&dsid));
proc fastclus data=have out=have2
              seed=fix  maxclusters=&ntotal 
              noprint maxiter=0 ;
     var x y;
run;


Comparing hash obj solution courtesy Paul Dorfman and my binary search approach:


104
105 data closest (drop = _:) ;
106 array _f [99999] _temporary_ ;
107 do _h = 2 by 1 until (z) ;
108 set fix end = z ;
109 _f[_h] = fix ;
110 end ;
111 _f[_h+1] = constant ("big") ;
112 do until (0) ;
113 set num ;
114 do _j = 2 by 1 until (_f[_j-1] <= num <= _f[_j]) ;
115 end ;
116 if _j = 2 then closest = _f[ 2] ;
117 else if _j = _h + 1 then closest = _f[_h] ;
118 else if sum (num, - _f[_j-1]) < sum (_f[_j], -num)
119 then closest = _f[_j-1] ;
120 else closest = _f[_j] ;
121 output ;
122 end ;
123 drop fix;
124 run ;



NOTE: There were 1000 observations read from the data set WORK.FIX.
NOTE: There were 1000000 observations read from the data set WORK.NUM.
NOTE: The data set WORK.CLOSEST has 1000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 40.61 seconds
cpu time 40.64 seconds

125
126
127 %let dsid=%sysfunc(open(fix));
128 %let ntotal=%sysfunc(attrn(&dsid, NOBS));
129 %let dsid=%sysfunc(close(&dsid));
130 proc fastclus data=num(rename=(num=fix)) seed=fix cluster=CLASS
131 maxclusters=&ntotal noprint maxiter=0 out=num2;
132 var fix;
133 run;

NOTE: There were 1000 observations read from the data set WORK.FIX.
NOTE: The data set WORK.NUM2 has 1000000 observations and 4 variables.
NOTE: PROCEDURE FASTCLUS used (Total process time):
real time 6.41 seconds
cpu time 6.42 seconds

134
135 data compare;
136 merge closest(keep=j closest) num2(keep=j Class) end=eof;
137 by j;
138 retain d 0;
139 d+(closest ne Class);
140 if eof then put d=;
141 run;

d=0

NOTE: There were 1000000 observations read from the data set WORK.CLOSEST.
NOTE: There were 1000000 observations read from the data set WORK.NUM2.
NOTE: The data set WORK.COMPARE has 1000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 3.92 seconds
cpu time 0.53 seconds


Binary search:



data have ;
input num @@ ;
cards ;
2.2 3 4.4 5 6.6 7 8.8 9 11.11 12.12 14.9 15.01
run ;

data fix ;
input num @@ ;
cards ;
3 6 9 12 15
run ;


%let dsid=%sysfunc(open(fix));
%let ntotal=%sysfunc(attrn(&dsid, NOBS));
%let dsid=%sysfunc(close(&dsid));
%put &ntotal;
proc sort data=fix; by num; run;
data have2;
array _f{1:&ntotal} _temporary_;
if _n_=1 then do;
do i=1 to &ntotal;
set fix point=i;
_f[i]=num;
end;
end;
set have;
L=0; U=&ntotal; break=0;
if num<=_f[1] then CLUSTER=_f[1];
else if num>=_f[&ntotal] then CLUSTER=_f[&ntotal];
else do;
do while (L < U & break=0);
mid=L+int((U-L)/2);
if _f[mid]=num then do; L=mid; break=1; end;
else do;
    do while (L < U & break=0);
         mid=L+int((U-L)/2);
    if _f[mid]=num then do; L=mid; break=1; end;
    else do;
            if _f[mid]< num then L=mid+1;
      else U=mid;
    end;
       end;
    if mid=L then mid=max(1, mid-1);
       if abs(num-_f[mid])< abs(num-_f[L]) then CLUSTER=_f[mid];
    else CLUSTER=_f[L];
  end;

put mid= L= U= num= CLUSTER=;
run;



Robustness of the PROC FASTCLUS solution:
the correctness of PROC FASTCLUS result is guranteed and demonstrated here.


Be sure to specify "maxiter=0" in the PROC FASTCLUS options.

sample code:



******************;
data have ;
input num @@ ;
cards ;
2.2 3 4.4 5 6.6 7 8.8 9 11.11 12.12 14.9 15.01
run ;

data fix ;
input num @@ ;
cards ;
3 6 9 12 15
run ;

data fix; set fix; CLUSTER=_n_; run;
data have; set have; id=_n_; run;


%let dsid=%sysfunc(open(fix));
%let ntotal=%sysfunc(attrn(&dsid, NOBS));
%let dsid=%sysfunc(close(&dsid));
proc fastclus data=have seed=fix 
              maxclusters=&ntotal noprint maxiter=0
              out=have2 cluster=CLUSTER;
     var num;
run;

proc sql;
create table have2 as
select a.*, b.num as fix
from have2 as a
join fix as b
on a.CLUSTER=b.CLUSTER
order by a.num
;
quit;

proc print data=have2; run;
****************;

 Posted by at 2:03 上午
9月 232009
 
I can’t take credit for selecting Seattle as the site of SAS Global Forum 2010. But I can thank the SAS Global Forum Executive Board for making that decision a number of years ago. We had a terrific conference the last time (SUGI 28 in 2003), and that must be one reason we’re going back. Personally, I’m planning on adding some vacation time in conjunction with business. How about you?

If you’re not into shopping, art museums, or kayaking, then you may want to check out the Geek’s Guide to Seattle. Not that I’m calling anyone a Geek, but if you’re like me, you may be interested in seeing the first wireless telephone (from 1910) on display in Seattle’s Museum Of History & Industry. Or you might want to experience flight without leaving the ground at the Museum of Flight’s flight simulators. There you can try your hand at WWII dogfights, hang gliders, or even landing on the Moon! Ever wonder where Microsoft chairman Bill Gates lives? His high-tech Medina mansion is on the shores of Lake Washington and is notable enough to have its own Wikipedia page. Have fun!
9月 202009
 
We launched a new discussion forum at the end of August. The SAS/GRAPH and ODS Graphics forum provides a place to discuss the visual representation of data, to share your questions, suggestions, experiences, and pains regarding graphics. Join the discussion about SAS graphics.

We all love to see our data represented in fancy graphs and we love the fact that these cool visual displays can get so much attention for our data. I bet that almost every report you generate includes at least one graphic. And, I bet that you are always looking for ways to pump up the visual elements included in your reports. SAS/GRAPH software and ODS Graphics can help you in that effort. Review the list of resources below.

Update, Nov 6:
This ondemand Webinar presented by Bob Rodriguez is a companion to the Getting Started with ODS Statistical Graphics paper referenced above. Watch the Webinar now.
[end update]

Here's one example that you can find in the samples gallery. Enjoy!

The following graph was produced using PROC SGRENDER. It is a Monthly Stock Price and Volume Graph [View the code in a new window]

Stock Plot


9月 142009
 
I'm in Burlington, VT for NESUG 09. I've never been here, so I got up early this morning to go for a short run to clear my head for a day on the demo floor and to get a look around. I found my way to Battery Park and thought the views were nice. Then I turned around to head to the lake front. The beauty of the morning made me stop and walk to take it all in. The pink and orange of the sunrise was glowing on the mountains across Lake Champlain. Wow. I resisted the temptation to sit in one of the swings along the lake and just watch the morning unfold. With that kind of inspiration, I'm ready for the day and looking forward to meeting some of you.
If you are at NESUG, I have two suggestions for you.

  • Go for a walk down by the lake, early in the morning if you can.

  • Come by the demo room and visit the SAS staff, partners, and poster presenters.


I have two demos while at NESUG. One will focus on ways to network, collaborate and influence SAS and other SAS users from support.sas.com. The other will focus on new features recently launched or coming soon to support.sas.com.
If you aren't a NESUG, you won't be left out. You can get this information at other regional users group meetings or by following this blog over the next few days.

One last note: If you are on Twitter, you can keep up with happenings at the conference by following the #NESUG09 hashtag.
9月 132009
 

#################################################
# A PIECE OF R SNIPPET TO PULL EXCHANGE RATES   #
# FROM WWW.OANDA.COM AND TO CALCULATE USD INDEX #
#################################################

library(fImport)

# DOWNLOAD EACH COMPONENT OF USDX FROM WWW.OANDA.COM
eur.usd <- oandaSeries("EUR/USD")
usd.jpy <- oandaSeries("USD/JPY")
gbp.usd <- oandaSeries("GBP/USD")
usd.cad <- oandaSeries("USD/CAD")
usd.sek <- oandaSeries("USD/SEK")
usd.chf <- oandaSeries("USD/CHF")

plot(cbind(eur.usd, usd.jpy, gbp.usd, usd.cad, usd.sek, usd.chf), 
     main = 'Components of USD Index')


# CALCULATE USDX BY COMBINING ALL COMPONENTS
# REFERENCE: http://en.wikipedia.org/wiki/USD_Index

usdx <- 50.14348112 * (eur.usd ^ (-0.576)) *
        (usd.jpy ^ 0.136) * (gbp.usd ^ (-0.119)) *
        (usd.cad ^ 0.091) * (usd.sek ^ 0.042) *
        (usd.chf ^ 0.036)
colnames(usdx) <- "USD Index"

plot(usdx, main = "USD Index")

 Posted by at 6:31 上午
9月 112009
 
Last Month at the SIGIR meeting in Boston , one of the presentations given by a Teragram customer attracted notice in a twitter post.

The NY Times automated the tagging of topics for their online website by their implementation of software to automatically build their indexes. However, as the tweet points out - the machine has NOT replaced Man because the newspaper continues to rely on MANUAL entries by people who maintain and build the New York Times Index, a more traditional index.



Stephen Arnold wondered in his blog why an organization might continue to require human labor on a task machine can now perform? Could be political resistance to change? or perhaps the machine fails sometimes? Perhaps the employees without skills to be reassigned are in fact prime for the next round of employees to see a "pink slip" as budgets get cut?

Mr. Arnold's ideas are all valid possibilities and I've seen cases of each in my experiences transferring technology from the research lab into business production environments. Those who put a stake in the ground and step forward to be the first to serve as role models for how text analytics can carry their business forward - ought to pause and consider their own culture.

Since the original question was about a Teragram customer implementation, I asked Saratendu Sethi , the director of Engineering at Teragram to share what he's pbserved in his consulting engagements. Here is his response.

First of all, even if automatic categorization guarantees >99% accuracy, for a News company, it is absolutely critical to not portray any wrong information for even 1%. This can only be verified by having humans validate the categorization results. They are doing that on a subset of articles, e.g. front-page articles.

Secondly, new topics constantly emerge in the coverage of current events. Even the best text mining algorithms can’t achieve perfection in spotting emerging topics because these algorithms are usually based on processing past content. Also, the
definition of emerging topics is based on human perception
which is affected by time, location and the type of entities involved in the event. Therefore, these topics have to be manually spotted and added to documents/taxonomy while they are emerging.

Having said that, the following are four benefits that Teragram categorization achieves for New York Times:
(1) If two people are asked to suggest categories on the same document on their own, they are always going to come up with different categories. Automatic categorization enforces consistency and removes human subjectivity by automatically suggesting them categories.
(2) Automatic categorization saves time because it is easier to ask editors to select appropriate categories from an automatically generated list rather than having them to think about them. With automatic categorization, I can just spend few seconds but with manual categorization I have to use few minutes to read the content and decide the appropriate topics
(3) Entity extraction (e.g. identifying person, locations, etc), which doesn’t require much human input, is automated.
(4) Automatic categorization enables New York Times to process all their past archives. Currently New York Times re-processes all their past 25years of content with updated taxonomies every few months.

4a. The human editors are only reviewing articles for current day (~500-1000 articles/day) whereas the past archives might include 100K articles/year.

4b. If “swine flu” was only identified as a News topic in 2009, then automatic categorization allows NYT to find out what other news appeared in past.

So what do you conclude from this post? How would YOU answer the question posed in the title of this entry ? It is cost effective to apply MAN "and" Machine together -- or has the science progressed enought to replace MAN ? Is it time to choose and go with Man "or" Machine approach when deciding about becoming more efficient?

Saratendu answers with the "AND" operator -- and thats the answer I prefer too --cause i'm not comfortable letting those sci fi robots and machines take over my world.

How about you?