字符型变量的处理往往涉及到大小写，针对这个问题， SAS 提供了 upcase(), lowcase () 和 propcase() 等函数，下面逐个介绍一下。
1. upcase ()
函数形式为 upcase (argument)，其中 argument 为任意 SAS 字符表达方式；函数可将所有字母转换为大写，转换后的长度如果没有被设定，则与 argument 的长度相同；若 argument 为空，则返回空值。
data _null_; name1=upcase("Yuewei Liu"); name2="Zhu Qi"; name2=upcase(name2); blank=upcase(" "); put name1/name2/blank; run; /* 结果 */ YUEWEI LIU ZHU QI
函数形式为 lowcase (argument)，其中 argument 为任意 SAS 字符表达方式；函数可将所有字母转换为小写，转换后的长度如果没有被设定，则与 argument 的长度相同；若 argument 为空，则返回空值。
data _null_; name1=lowcase("Yuewei Liu"); name2="Zhu Qi"; name2=lowcase(name2); blank=lowcase(" "); put name1/name2/blank; run; /* 结果 */ yuewei liu zhu qi
函数形式为 propcase (argument, <,delimiter(s)>)，其中 argument 为任意 SAS 字符表达方式，delimiter, 即分隔符，默认为 空格, /, -, (, ), ., tab, 如果设定分隔符的话，包括空格在内的默认的分隔符会失效；该函数首先将所有的大写字母转换为小写，然后将由指定分隔符隔开的字符串的首个字母转换为大写。具体看下面几个例子（来自 SASHELP）：
/* 1. 修改字符串大小写 */ data _null_; input place $ 1-40; name=propcase(place); put name; datalines; INTRODUCTION TO THE SCIENCE OF ASTRONOMY VIRGIN ISLANDS (U.S.) SAINT KITTS/NEVIS WINSTON-SALEM, N.C. ; run; /* 结果输出 */ Introduction To The Science Of Astronomy Virgin Islands (U.S.) Saint Kitts/Nevis Winston-Salem, N.C. /* 2. 和其它函数配合使用 */ data _null_; x=lowcase('THIS IS A DOG'); y=propcase(x); z=propcase(lowcase('THIS IS A DOG')); put x=; put y=; put z=; run; /* 结果输出 */ x=this is a dog y=This Is A Dog z=This Is A Dog /* 3. 和其它函数配合使用 */ data _null_; string1='VERY RARE BOOKS'; case1=propcase(string1); put case1=; case2=propcase(tranwrd(string1, 'VERY', ' ')); put case2=; run; /* 结果输出 */ case1=Very Rare Books case2= Rare Books /* 4. 结合分隔符一起使用 */ options pageno=1 nodate ls=80 ps=64; data names; infile datalines dlm='#'; input CommonName : $20. CapsName : $20.; PropcaseName=propcase(capsname, " -'"); datalines; Delacroix, Eugene# EUGENE DELACROIX O'Keeffe, Georgia# GEORGIA O'KEEFFE Rockwell, Norman# NORMAN ROCKWELL Burne-Jones, Edward# EDWARD BURNE-JONES ; proc print data=names noobs; title 'Names of Artists'; run; /* 结果输出 */ CommonName CapsName PropcaseName Delacroix, Eugene EUGENE DELACROIX Eugene Delacroix O'Keeffe, Georgia GEORGIA O'KEEFFE Georgia O'Keeffe Rockwell, Norman NORMAN ROCKWELL Norman Rockwell Burne-Jones, Edward EDWARD BURNE-JONES Edward Burne-Jones
这三个函数在处理字母大小写的过程中，各具特色，实际工作中可灵活运用，尤其是 propcase() 函数，在处理人名、标题等需要首字符大写时，用处更大。下面解决一个小问题：如何只将首字母转换为大写？
方法比较多，我暂时就列几种，不过使用 propcase() 是一种不错的方法，比较利索，但需要设定分隔符，而且要保证分隔符不出现在待转换的字符串内，这样的话，整个字符串会被当作一个字符串，所以只有第一个字母会被转换为大写。如下面例子中也用了 substr() 函数来处理，效果一样。
data _null_; length char char1 char2 $ 15; char="this is a test"; char1=propcase(char,"-"); char2=upcase(substr(char,1,1))||substr(char,2); char3=char; substr(char3,1,1)=upcase(substr(char3,1,1)); put char1=/char2=/char3=; run; char1=This is a test char2=This is a test char3=This is a test
SAS 中使用 Compress 函数删除或保留字符串中特定字符
SAS 中使用 Unamed Pipe 获取文件名
On the world-harmony-for-profit theme, he shared information about web sites such as Kiva.org that facilitate microfinancing around the world. There are other microfinance sites that help people closer to home (for us in the USA), but as Pogue said, only Kiva.org can give you that "rosy glow" when you know you're helping people in developing countries.
Kiva.org opens financial doors for people who might not have another source of funding; but it also presents a platform rich in data for analysis and reporting. The folks at Kiva.org support web services that allow you to build applications that reference the data that they collect. They also offer "data snapshots": downloadable versions of all of the data they have on the loans, loan recipients, and the lenders who participate.
If you could get this data into SAS, what insights could you glean? What cool stats could you produce? What stories could you tell with charts and plots?
So, now we come to your homework assignment...if you choose to accept it. I've already done the grunt work of writing a SAS program that transforms the raw data (from its XML format) into SAS data sets. I've even written a sample step that produces a simple chart based on the current data.
What can you do with this data using SAS? There are two data sets: lenders (over 400,000 records) and loans (over 165,000 records). They contain columns relating to geography (location of lenders and loan recipients), quantity (how many loans, what amounts), categories (loan purpose/industry, gender of recipient), and time (when the loan was granted/funded). You can read about the data on Kiva.org, and then create interesting reports using SAS.
Bonus assignment: can you improve my SAS program that pulls the data into SAS? I promise you: there is lots of room for optimization. (If I held off of this post until I perfected it, it would be ready for World Statistics Day 2011.) My implementation uses the XML libname engine, DATA step, and PROC SQL. It could be more automated (download the zip file with FILENAME URL, extract and process) and more efficient (faster appends, perhaps joining and summarizing for easier analysis). The program encounters a few errors when it runs, probably due to character encoding in the XML data. What would you do differently?
Here's how you can get started:
- Download my SAS program and XML map files from this ZIP file here (small, just about 3K).
- Extract the ZIP file to a new folder that your SAS session can access as the Kiva "root" folder (example: "C:\public\Kiva" or "/u/userid/Kiva").
- Download the data snapshot from Kiva.org (big, about 150MB ZIP file). You need the XML format (not the JSON format).
- Extract the data snapshot files into your Kiva "root" folder.
- Modify my kivaProgram.sas file to set the Kiva data root folder, and set the number of loan XML files and lender XML files (as described in the comments in the program).
(By the way, I wrote this program entirely using SAS Enterprise Guide 4.3. So I know that you can run it from there, or within whatever SAS 9.2 environment you have access to.)
What better way to celebrate World Statistics Day than to compute some statistics for the world? Post your experiences back here in the comments, or use sasCommunity.org to share more details and post the link.
Refresh some basic logical propositions (or statements):
implication: if P then Q (P—>Q)
inverse: if not P then not Q (-P—>-Q)
converse: if Q then P (Q—>P)
contrapositive: if not Q then not P (-Q—>-P)
contradition: if P then not Q (P—>-Q)
Mathematically or logically speaking, if the implication statement holds, then the contrapositive holds, but the inverse does not hold, i.e., if P then Q, then we can get if not Q then not P, but we can not get if not P then not Q.
Mr. Jones, if you get a grade below 70 on the final, you are going to flunk this course.
We adapt it in a logical implication form:
Mr. Jones, if P then Q, where
P: you get a grade below 70 on the final
Q: you are going to flunk this course
Considering the context, we can also get that the inverse holds: if you get a grade above er or equal to 70, then you are going to pass this course(if not P then not Q ).
Question: when do statistical programming, what types of logics you use?
Answer: Not all mathematically. see
if score<70 then grade="flunk"; *if P then Q;
else grade="pass"; *if not P then not Q;
Sounds like it might be a dry topic, right? Not at all. It turned out to be a fascinating story about how they were able to focus on solving a business problem, and how they turned it into a strategic advantage that drives customer satisfaction.
RCI stands for Resort Condominiums International, which pioneered vacation ownership exchange in 1974 and strives to create flexibility for two key markets - resort developers and vacation owners. Among the vacation owners, at any given time the property owners may also be property seekers, hence the concept of the exchange. RCI is the industry leader, having facilitated 2 million exchange vacations last year – three times more than its nearest competitor. As Sean proceeded to explain why they engage in demand forecasting, the beautiful symmetry of it all became apparent.
Every time a timeshare owner successfully finds another property they want to try, RCI earns a fee. The property seeker is happy when they get the property they want at the time they need it. The property owner is happy when they get the best exchange value they could get for their property at the time of the transaction. And RCI helps ensure a source of future revenue by making both the property owners and property seekers happy. And with an efficient exchange market, it provides an incentive for resort developers to participate in the RCI network as a key benefit to their customers, who buy into the properties to begin with. So all that happens smoothly as a result of accurately forecasting demand and setting prices accordingly.
When you’re able to make multiple stakeholders happy, while helping to drive future revenues and grow your business, that’s a value proposition that translates to success in any business. Sean started out by jokingly assuring us he would not try to talk us into a timeshare in Marbella or Miami, but after hearing his story, I’ll bet he might have had a few takers by the time he finished.
Last month, I gave a talk, XML: the SAS Approach, in CDISC Interchange China 2010(at the Medical School of Fudan University, Shanghai, 2010-09-15). FDA favors CDISC and HL7, the two XML based standards, and SAS programmers in biopharmaceutical industry need incorporate the XML technology into their toolboxes. Fortunately, you don’t need to be an XML expert then to play XML in your daily work, and, SAS system DOES offer multiple tools and applications to handle XML files, i.e. import and export XML data:
- SAS data steps approach: import and export
- SAS XML Libname engine: import and export
- SAS ODS XML statement(ODS MARKUP): export
- PROC CDISC: import and export
- SAS XML Mapper: import
- SAS CDISC Viewer: as if import
A simple demo. First, use FILE and PUT statements to generate an XML file:
put ‘<?xml version="1.0" encoding="windows-1252" ?>’;
put ‘<text> Welcome to CDISC Interchange 2010 China </text>’;
put ‘<text> We are in Shanghai! </text>’;
Then read the whole XML file to SAS dataset:
data import0 ;
infile "export.xml" dsd missover truncover lrecl = 1024;
input line $1024.;
if line = ” then delete;
Third step, extract the information you want(the text between <text> and </text> tags) using Perl Regular Expression:
data import (keep = line );
retain queName ;
retain line ;
/*use PRX to capture the structure of XML data;*/
If _n_=1 then do;
/*use PRX to remove the XXML tags;*/
if queNameN>0 then do;
The logic of PRX approach to process XML data is very simple and can be easily modified according to your needs:
- complicate and utilize the PRX codes to capture the hierarchical structure of XML data.
- remove XML tags and output the information to SAS dataset.
SAS Programming for Data Mining Applications oloolo 用BASE，STAT等等编程模块实现与EM同等功能的各种数据挖掘功能的算法
Statecompute WenSui Liu 关于市场研究，数据挖掘等等方面的, SAS, R,python等各种软件
一个SASor的技术空间 sxlion 用SAS编程解决各种数据清理，数据整理和解决各种实际问题
一个SASor的图表空间 sxlion 主要侧重于商业，学术和工业图表，用SAS模仿商业周刊风格图表
SUGI CLUB 挑出各SUG的一些文章进行翻译，做少量补充，内容较广
风 anyjack 各种SAS技术，偏IT
具有SAS部分或全部功能的语言/系统列表：R, Python, C/C++, Java, Oracle, Weka, Rapidminer; Clementine, JMP, Matlab,S-plus, Teradata, SAP。