7月 032009
 

好久没有写blog了,都有点生疏了

下面这文章主要是讲如何利用java来调用本地SAS数据,这样你只需要买了SAS BASE就可以通过JAVA来访问SAS数据

欢迎大家随意转载



1.改动C:\WINDOWS\system32\drivers\etc\services文件,加入如下行:
odbcserv              5061/tcp 
(服务和端口号可以随意取)
2.在控制面板上建立ODBC
在选择驱动程序时选择[SAS]
ODBC数据源配置/服务器配置栏:服务器名称输入odbcservr,然后在单击配置按钮时请确认配置框提供的SAS.exe文件的正确性
ODBC数据源配置/逻辑库配置栏:输入你想要预先分配的库名和路径
3.使用[附录1-java代码]通过jdbcodbc访问
4.其他注意事项参考[附录2-注意事项]

附录1-java代码
import

 Posted by at 11:37 上午
6月 302009
 
What have you been up to since SAS Global Forum 2009? For me, among other things, I’ve seen Mt Hood on a clear day while supporting the Pharmaceutical SAS users group (PharmaSUG) meeting in Portland, OR. I’ve managed to get to the local farmer’s market on a weekly basis (the tomatoes are in), and I’ve made it to the beach once. The kids are out of school, and the weather is predictably hot and humid here in Cary, NC. Seattle, the site of SGF 2010, is 2,893 miles away. But it just got a lot closer.

We had our official “Core Team” meeting this past week to start official conference planning. Program Manager Donna Daniels hosted 12 of us in a conference room and 2 via phone. So here we go again… and it’s a good thing. Ready, Set, Go! It takes a small army of us from SAS to help prepare for and support the conference. I get to work with folks from ISD, Corporate Communications, Video Communications and New Media, Travel, Security, Corporate Services, Accounting, Legal and Risk Management and more. It’s a huge, campus-wide, team effort. But of course we are not working alone – we work closely and collaboratively with the Conference Chair, Section Chairs, and the SAS Global Users Group Executive Board. All of us come together with one goal of creating a meaningful and fun conference experience for the thousands that we expect to see next April 11 - 14, 2010 at the Washington State Convention and Trade Center.

What's coming up soon? A revamped Web site debuts, and the Call for Papers opens in a few weeks. I hope that you’ll continue to follow my blog as I post about behind-the scenes happenings. And, I hope that you’ll let me know what keeps you coming back if you’ve attended SGF before. Let us know so that we can keep doing the right things.

And if you haven’t attended yet – well it’s my job to whet your appetite with my view of SAS Global Forum.
6月 272009
 
SAS is sending data & text mining experts (including Teragram employees) over the ocean to Europe for two different events this week.
We'll have a booth in the exhibit hall at KDD09 Sunday through Wednesday. If you are one of the lucky ones attending KDD, mark your program to attend the panel discussion to listen to Dr Wayne Thompson from SAS talk about Emerging Trends in Open Standards and Cloud Computing for Data Mining .

Even if you don't make it to the KDD conference to personally pick up the new book authored by the conference chair John Elder , you can experience our Software-on-Demand version of data mining by buying his book, "Handbook of Statistical Analysis and Data Mining Applications."


The second event where you can find us is at the SAS conference devoted to ANALYTICS called A2009 in Denmark July 1,2. The program is online. There you can read the abstract about the a Swedish Insurance firm that studied hand written notes collected by police officers and security guards during 2004-2007.

At both shows, you'll be able to see live demos of our software and pick up a hard copy of the most recent fact sheet, highlighting the enhancements that are now available with the TEXT MINER 4.1 version that was made available to customers 5 weeks ago. Those of you reading this blog that haven't yet seen it may want to read the fact sheet on our SAS 9.2 release of Text Miner on the SAS website.

What does your summer hold for you? Do you have travel plans to shows or conferences with text analytics tracks or sessions included? Please add a comment to this blog and do share!
6月 202009
 
A recent survey by IDG Research Services, highlights Business Process Automation as an IT priority.

Some of the findings include:
• More than 2/3 of respondents are automating most of their core business processes
• Another 21% are moving towards this goal
• 87% consider BPA to be a critical or important IT priority
• 87% see a connection between unified communications and process automation,
• More than one third envision communication technology being incorporated into BPA in the future

Even though I have not spoken with Joe Staples and Brad Herrington from "Interactive Intelligence", I share their observation that many in today’s economic environment, are trying to streamline operations and do more with less. As organizations seek ways to be more efficient, both in the front office and back office, we might position our technology as a tool for automating business processes leading to improved business results. Have any of you motivated your IT department to spend $$ on Textual Analytic software or recruit support for your research program with this approach?

Its rare for BPA companies to include automating manual processes surrounding words or unstructured content via TEXT Technologies. After I watch the webcast on June 25 and get the white paper - I'll let you know if any mention of Natural Language processing or Content Categorization or Sentiment Analysis is made.

Meanwhile, it's up to all of us to continue to promote awareness and implement Text Analytics into real world situations. We aren't talking about a dream of some vague emerging futuristic possibility , the time is now to include text communication in with traditional data sources of computer processing applications.

When one combines text analytics with mathematical optimization and predictive analytics, we can go well beyond merely automating business processes by improving and discovering entirely new processes leading to a sustainable future. Thanks for reading.



6月 192009
 
原文載點:http://support.sas.com/resources/papers/proceedings09/191-2009.pdf

這一篇技術文件是簡單地利用一個真正的女性HIV資料來教如何使用 SAS 檢定 mediator(或稱 mediation) 和 moderator。關於 mediator 和 moderator 的定義請參照:

Mediator:http://davidakenny.net/cm/mediate.htm
Moderator:http://davidakenny.net/cm/moderation.htm

首先,這個資料背景是來自一個cross-sectional的長期研究裡面所抽出來的第一次面訪資料,總計有 280 位遭到 HIV 感染的女性。

Baron & Kenny (1986) 首度發表檢測 mediator 的方法,整個流程要跑三個迴歸模式(Eq1, Eq2, Eq3),並且要符合四個準則(C1, C2, C3, C4)。三個迴歸模式分別為:
(Eq1) IV -> DV
(Eq2) IV -> M (Mediator)
(Eq3) IV + M -> DV

前兩個準則 C1 和 C2 是,如果 Eq1 和 Eq2 都出現顯著的結果,基本上就表示 Mediator 可能是存在的。此外,還需要另兩個存在於 Eq3 的準則需要符合:
(C3) M 在 Eq3 一定要顯著。
(C4) IV 的估計參數在 Eq3 要降到 0。

若這四個準則都達成了,則可以說這個 M 變數是 full mediator。如果 C4 沒有達成,則稱 M 為 partial mediator。

最後,再用 Sobel test 來檢定一個 mediator 是否顯著地將 IV -> DV 的總效應完全取代。

因此,在這個 HIV Women 的資料庫中,定義 Available Social Support (tssqav) 為 IV,Reason for Missing Medication (treas) 為 DV,而 mediator 變數則為 Spiritual Activity (tcopesa)。

我們可以連續用三個 PROC REG 程序把 Eq1~Eq3 給建立起來:
ods rtf;
ods listing close;
proc reg data=two;
model treas = tssqav / stb pcorr2 scorr2;
title ' Regression model / step1 y=x' ;
run;
proc reg data=two;
model tcopesa = tssqav / stb pcorr2 scorr2;
title ' Regression model / step2 m=x' ;
run;
proc reg data=two;
model treas = tssqav tcopesa / stb pcorr2 scorr2;
title ' Regression model / step3 y=m x' ;
run;
ods rtf close;
ods listing;
quit;
run;
第一個 PROC REG 得到 β=-0.98 (p-value=0.02),因此 C1 達成。第二個 PROC REG 程序得到 β=0.143 (p-value=0.003),因此 C2 也達成。至於 Eq3 的配適結果,得到 (β1, β2)=(-0.79, -0.44),其 p-value 分別為 0.055 和 0.02,表示 M 在 Eq3 依舊顯著,但 IV 在 Eq3 變成不顯著了。所以基本上 C3 是達成了,但由於 β1 沒有降到接近 0,所以 C4 不算達成,因此 Spiritual Activity 只能稱是個 partial mediator,而不是 full mediator。

不過本篇技術文件並沒有繼續去做 Sobel test,但前北卡大心理系教授 Dr. Preacher 和 OSU 教授 Dr. Hayes 曾經在 2004 年發表了一篇關於 Sobel test 的論文,裡面有附完整的 Sobel test SAS macro。

原文:http://www.comm.ohio-state.edu/ahayes/BRMIC2004.pdf
程式:here
語法:%sobel(data=file, y=dv, x=iv, m=med, boot=z);

其中,data表示想要呼叫進來使用的資料,y 是放 DV 變數名稱,x 是放 IV 變數名稱, m 是放 mediator 變數名稱,而 boot 則是指定要做 bootstrap resampling 的次數,從 1000 到 1000000 之間任一數字皆可。如果不想用的話就直接寫 0,這樣一來 %sobel 會自動關閉 bootstrap resampling 的功能。

網路上也有不少的 Sobel test calculator,可自行用 google 搜尋。

那麼,要進行 moderator 的檢定,則需要配適這個迴歸模型:IV+M+IV*M -> DV。如果 IV*M 的估計參數是顯著的話,則表示 M 是 IV 和 DV 的 moderator。程式如下:

ods rtf;
ods listing close
proc reg data=two;
model treas = tssqav tcopesa sscopesa/ stb pcorr2 scorr2;
title ' Regression model / testing moderator effect' ;
run;
ods rtf close;
ods listing;
quit;
run;
其中 sscopesa 是 tssqav 和 tcopesa 的交互作用項,這是由於 PROC REG 程序裡面不能使用 tssqav*tcopesa 這種語法來代表交互作用。因此在跑這個程式之前,一定要先用一個 data step 把交互作用項用另一個變數名稱給建立起來。

最後我們得到 IV*M 的估計參數 β=0.00175,其 p-value=0.5172 並不顯著,因此可以測得 Spiritual Activity 並不是 Moderator。
 Posted by at 4:57 上午
6月 192009
 
原文載點:http://support.sas.com/resources/papers/proceedings09/191-2009.pdf

這一篇技術文件是簡單地利用一個真正的女性HIV資料來教如何使用 SAS 檢定 mediator(或稱 mediation) 和 moderator。關於 mediator 和 moderator 的定義請參照:

Mediator:http://davidakenny.net/cm/mediate.htm
Moderator:http://davidakenny.net/cm/moderation.htm

首先,這個資料背景是來自一個cross-sectional的長期研究裡面所抽出來的第一次面訪資料,總計有 280 位遭到 HIV 感染的女性。

Baron & Kenny (1986) 首度發表檢測 mediator 的方法,整個流程要跑三個迴歸模式(Eq1, Eq2, Eq3),並且要符合四個準則(C1, C2, C3, C4)。三個迴歸模式分別為:
(Eq1) IV -> DV
(Eq2) IV -> M (Mediator)
(Eq3) IV + M -> DV

前兩個準則 C1 和 C2 是,如果 Eq1 和 Eq2 都出現顯著的結果,基本上就表示 Mediator 可能是存在的。此外,還需要另兩個存在於 Eq3 的準則需要符合:
(C3) M 在 Eq3 一定要顯著。
(C4) IV 的估計參數在 Eq3 要降到 0。

若這四個準則都達成了,則可以說這個 M 變數是 full mediator。如果 C4 沒有達成,則稱 M 為 partial mediator。

最後,再用 Sobel test 來檢定一個 mediator 是否顯著地將 IV -> DV 的總效應完全取代。

因此,在這個 HIV Women 的資料庫中,定義 Available Social Support (tssqav) 為 IV,Reason for Missing Medication (treas) 為 DV,而 mediator 變數則為 Spiritual Activity (tcopesa)。

我們可以連續用三個 PROC REG 程序把 Eq1~Eq3 給建立起來:
ods rtf;
ods listing close;
proc reg data=two;
model treas = tssqav / stb pcorr2 scorr2;
title ' Regression model / step1 y=x' ;
run;
proc reg data=two;
model tcopesa = tssqav / stb pcorr2 scorr2;
title ' Regression model / step2 m=x' ;
run;
proc reg data=two;
model treas = tssqav tcopesa / stb pcorr2 scorr2;
title ' Regression model / step3 y=m x' ;
run;
ods rtf close;
ods listing;
quit;
run;
第一個 PROC REG 得到 β=-0.98 (p-value=0.02),因此 C1 達成。第二個 PROC REG 程序得到 β=0.143 (p-value=0.003),因此 C2 也達成。至於 Eq3 的配適結果,得到 (β1, β2)=(-0.79, -0.44),其 p-value 分別為 0.055 和 0.02,表示 M 在 Eq3 依舊顯著,但 IV 在 Eq3 變成不顯著了。所以基本上 C3 是達成了,但由於 β1 沒有降到接近 0,所以 C4 不算達成,因此 Spiritual Activity 只能稱是個 partial mediator,而不是 full mediator。

不過本篇技術文件並沒有繼續去做 Sobel test,但前北卡大心理系教授 Dr. Preacher 和 OSU 教授 Dr. Hayes 曾經在 2004 年發表了一篇關於 Sobel test 的論文,裡面有附完整的 Sobel test SAS macro。

原文:http://www.comm.ohio-state.edu/ahayes/BRMIC2004.pdf
程式:here
語法:%sobel(data=file, y=dv, x=iv, m=med, boot=z);

其中,data表示想要呼叫進來使用的資料,y 是放 DV 變數名稱,x 是放 IV 變數名稱, m 是放 mediator 變數名稱,而 boot 則是指定要做 bootstrap resampling 的次數,從 1000 到 1000000 之間任一數字皆可。如果不想用的話就直接寫 0,這樣一來 %sobel 會自動關閉 bootstrap resampling 的功能。

網路上也有不少的 Sobel test calculator,可自行用 google 搜尋。

那麼,要進行 moderator 的檢定,則需要配適這個迴歸模型:IV+M+IV*M -> DV。如果 IV*M 的估計參數是顯著的話,則表示 M 是 IV 和 DV 的 moderator。程式如下:

ods rtf;
ods listing close
proc reg data=two;
model treas = tssqav tcopesa sscopesa/ stb pcorr2 scorr2;
title ' Regression model / testing moderator effect' ;
run;
ods rtf close;
ods listing;
quit;
run;
其中 sscopesa 是 tssqav 和 tcopesa 的交互作用項,這是由於 PROC REG 程序裡面不能使用 tssqav*tcopesa 這種語法來代表交互作用。因此在跑這個程式之前,一定要先用一個 data step 把交互作用項用另一個變數名稱給建立起來。

最後我們得到 IV*M 的估計參數 β=0.00175,其 p-value=0.5172 並不顯著,因此可以測得 Spiritual Activity 並不是 Moderator。
 Posted by at 4:57 上午
6月 172009
 
SAS Publishing wants to get closer to the people who read, write, or dream of writing a SAS Press book. You can find SAS Publishing products and contact information on support.sas.com in the Bookstore. SAS Publishing is reaching out to you in other locations so that we can get to know each other better. Join the conversation.

  • Become a fan of SAS Publishing on Facebook.
  • Follow @SASPublishing on Twitter.
  • Join Fans of SAS Books on LinkedIn.

Watch this space for more ways to connect with SAS and SAS Publishing.

Text Speak

 未分类  No Responses »
6月 172009
 
I just posted a tweet to my @ManyaMayes Twitter account. In order to get my message across, in 140 characters or less, I had to shorten my text. This is a very common practise for mobile phone users who send text messages that look a lot like a foreign language. My Mum writes messages that are so clipped that I have trouble deciphering them! As a BlackBerry user, I send email messages but I rarely send SMS messages. I've spent many years making sure I write messages that are easy for audiences to understand. It's going to take me a while to get used to writing clipped text (writing in text speak) as part of my job. It goes against much of my professional training to write like this: u no wot u no & u don't no wot u don't

How does text mining handle this? One approach would be to specify synonyms for these clipped terms:

u = you
no = know
wot = what

But "no" and "know" are both valid dictionary entries, so this will immediately cause a follow on problem since surely not all occurrences of "no" should be replaced with "know". Deciding which occurrences of "no" should be replaced with "know" is aided by using additional context of the document. Boolean and linguistic rules can help with this.

It can be difficult to solve data quality problems like this and typically solutions are specific to both the data and the application. For example, the way you would replace R&R would depend on whether the data came from a forum for military personnel talking about upcoming "rest and relaxation" or whether it was a warranty report describing "repair and replace" for a defective part or other...
6月 122009
 
原文載點:http://support.sas.com/resources/papers/proceedings09/158-2009.pdf

雖然不曉得有多少人已經拿到 SAS V9.2,不過由於我已經拿到了,所以之後會開始陸續介紹一些新版的功能。

首先先來展示一個 V9.2 最新的繪圖程序—PROC SGPLOT。舊版的 SAS 雖然有提供繪圖程序,但是他們都分散在不同的程序裡面,反而造成使用者的不便。此外,他們的老毛病還是存在,那就是畫出來的圖品質不佳,後來雖然有 ODS 的協助,稍微改善了這方面的缺失,不過 V9.2 版把這些舊的繪圖程序都打包在 PROC SGPLOT 裡面。SGPLOT 顧名思義就是 sophisticated graphical plot 的縮寫,讓我們先來看看這個新繪圖程序的功能。

Continue reading »
 Posted by at 3:23 上午