String

7月 272021
 

In the past, the COMPRESS function was useful. Since SAS version 9, it has become a blockbuster, and you might not have noticed. The major change was the addition of a new optional parameter called MODIFIERS.

The traditional use of the COMPRESS function was to remove blanks or a list of selected characters from a character string. The addition of a MODIFIER argument does two things. First, you can specify classes of characters to remove, such as all letters, all punctuation marks, or all digits. That is extremely useful, but the addition of the 'k' modifier is why I used the term blockbuster in my description. The 'k' modifier flips the function from one that removes characters from a string to one that keeps a list of characters and removes everything else. Let me show you some examples.

This first example stems from a real problem I encountered while trying to read values that contained units. My data looked something like this:

ID     Weight 
001    100lbs.
002     59Kgs.
003    210LBS
004    83kg

My goal was to create a variable called Wt that represented the person's weight in pounds as a numeric value.

First, let’s look at the code. Then, I’ll give an explanation.

data Convert;
   length ID $3 Weight $8;
   input ID Weight;
 
   Wt = input(compress(Weight,,'kd'),8.);
   /* The COMPRESS function uses two modifiers, 'k' and 'd'.  This means
      keep the digits, remove anything else.  The INPUT function does the
      character-to-numeric conversion.
   */
 
   If findc(Weight,'k','i') then Wt = Wt * 2.2;
 
   /* the FINDC function is looking for an upper or lowercase 'k' in the
      original character string.  If found, it converts the value in
      kilograms to pounds (note: 1 kg = 2.2 pounds).
   */
 
datalines;
001    100lbs.
002     59Kgs.
003    210LBS
004    83kg
;
title "Listing of Data Set Convert";
footnote "This program was run using SAS OnDemand for Academics";
proc print data=Convert noobs;
run;

The program reads the value of Weight as a character string. The COMPRESS function uses 'k' and 'd' as modifiers. Notice the two commas in the list of arguments. A single comma would interpret 'kd' as the second argument (the list of characters to remove). Including two commas notifies the function that 'kd' is the third argument (modifiers). You can list these modifiers in any order, but I like to use 'kd', and I think of it as "keep the digits." What remains is the string of digits. The INPUT function does the character-to-numeric conversion.

Your next step is to figure out if the original value of Weight contained an upper or lowercase 'k'. The FINDC function can take three arguments: the first is the string that you are examining, the second is a list of characters that you are searching for, and the third argument is the 'i' modifier that says, "ignore case" (very useful).

If the original character string (Weight) contains an uppercase or lowercase 'k', you convert from kilograms to pounds.

Here is the output:

There is one more useful application of the COMPRESS function that I want to discuss. Occasionally, you might have a text file in ASCII or EBCDIC that contains non-printing characters (usually placed there in error). Suppose you want just the digits, decimal points (periods), blanks, and commas. You need to read the original value as a text string. Let's call the original string Contains_Junk. All you need to convert these values is one line of code like this:

Valid = compress(Contains_Junk,'.,','kdas');

In this example, you are using all three arguments of the COMPRESS function. As in pre-9 versions of SAS, the second argument is a list of characters that you want to remove. However, because the third argument (modifiers) contains a 'k', the second argument is a list of characters that you want to keep. In addition to periods and commas, you use modifiers to include all digits, uppercase and lowercase letters (the 'a' modifier - 'a' for alpha), and space characters (these include spaces, tabs, and a few others such as carriage returns and linefeeds). If you did not want to include tabs and other "white space" characters, you could rewrite this line as:

Valid = compress(Contains_Junk,'., ','kd');

Here you are including a blank in the second argument and omitting the 's' in the modifier list.

You can read more about the COMPRESS function in any of the following books, available from SAS Press as an e-book or from Amazon in print form:

Or my latest programming book:

 

Questions and/or comments are welcome.

The Amazing COMPRESS Function was published on SAS Users.

5月 292020
 

While working at the Rutgers Robert Wood Johnson Medical School, I had access to data on over ten million visits to emergency departments in central New Jersey, including ICD-9 (International Classification of Disease – 9th edition) codes along with some patient demographic data.

I also had the ozone level from several central New Jersey monitoring stations for every hour of the day for ten years. I used PROC REG (and ARIMA) to assess the association between ozone levels and the number of admissions to emergency departments diagnosed as asthma. Some of the predictor variables, besides ozone level, were pollen levels and a dichotomous variable indicating if the date fell on a weekend. (On weekdays, patients were more likely to visit the personal physician than on a weekend.) The study showed a significant association between ozone levels and asthma attacks.

It would have been nice to have the incredible diagnostics that are now produced when you run PROC REG. Imagine if I had SAS Studio back then!

In the program, I used a really interesting trick. (Thank you Paul Grant for showing me this trick so many years ago at a Boston Area SAS User Group meeting.) Here's the problem: there are many possible codes such as 493, 493.9, 493.100, 493.02, and so on that all relate to asthma. The straightforward way to check an ICD-9 code would be to use the SUBSTR function to pick off the first three digits of the code. But why be straightforward when you can be tricky or clever? (Remember Art Carpenter's advice to write clever code that no one can understand so they can't fire you!)

The following program demonstrates the =: operator:

*An interesting trick to read ICD codes;
<strong>Data</strong> ICD_9;
  input ICD : $7. @@;
  if ICD =: "493" the output;
datalines;
493 770.6 999 493.9 493.90 493.100
;
title "Listing of All Asthma Codes";
<strong>proc</strong> <strong>print</strong> data=ICD_9 noobs;
<strong>run</strong>;

 

Normally, when SAS compares two strings of different length, it pads the shorter string with blanks to match the length of the longer string before making the comparison. The =: operator truncates the longer string to the length of the shorter string before making the comparison.

The usual reason to write a SAS blog is to teach some aspect of SAS programming or to just point out something interesting about SAS. While that is usually my motivation, I have an ulterior motive in writing this blog – I want to plug a new book I have just published on Amazon. It's called 10-8 Awaiting Crew: Memories of a Volunteer EMT. One of the chapters discusses the difficulty of conducting statistical studies in pre-hospital settings. This was my first attempt at a non-technical book. I hope you take a look. (Enter "10-8 awaiting crew" or "Ron Cody" in Amazon search to find the book.) Drop me an email with your thoughts at ron.cody@gmail.com.

Using SAS to estimate the link between ozone and asthma (and a neat trick) was published on SAS Users.

5月 292020
 

While working at the Rutgers Robert Wood Johnson Medical School, I had access to data on over ten million visits to emergency departments in central New Jersey, including ICD-9 (International Classification of Disease – 9th edition) codes along with some patient demographic data.

I also had the ozone level from several central New Jersey monitoring stations for every hour of the day for ten years. I used PROC REG (and ARIMA) to assess the association between ozone levels and the number of admissions to emergency departments diagnosed as asthma. Some of the predictor variables, besides ozone level, were pollen levels and a dichotomous variable indicating if the date fell on a weekend. (On weekdays, patients were more likely to visit the personal physician than on a weekend.) The study showed a significant association between ozone levels and asthma attacks.

It would have been nice to have the incredible diagnostics that are now produced when you run PROC REG. Imagine if I had SAS Studio back then!

In the program, I used a really interesting trick. (Thank you Paul Grant for showing me this trick so many years ago at a Boston Area SAS User Group meeting.) Here's the problem: there are many possible codes such as 493, 493.9, 493.100, 493.02, and so on that all relate to asthma. The straightforward way to check an ICD-9 code would be to use the SUBSTR function to pick off the first three digits of the code. But why be straightforward when you can be tricky or clever? (Remember Art Carpenter's advice to write clever code that no one can understand so they can't fire you!)

The following program demonstrates the =: operator:

*An interesting trick to read ICD codes;
<strong>Data</strong> ICD_9;
  input ICD : $7. @@;
  if ICD =: "493" the output;
datalines;
493 770.6 999 493.9 493.90 493.100
;
title "Listing of All Asthma Codes";
<strong>proc</strong> <strong>print</strong> data=ICD_9 noobs;
<strong>run</strong>;

 

Normally, when SAS compares two strings of different length, it pads the shorter string with blanks to match the length of the longer string before making the comparison. The =: operator truncates the longer string to the length of the shorter string before making the comparison.

The usual reason to write a SAS blog is to teach some aspect of SAS programming or to just point out something interesting about SAS. While that is usually my motivation, I have an ulterior motive in writing this blog – I want to plug a new book I have just published on Amazon. It's called 10-8 Awaiting Crew: Memories of a Volunteer EMT. One of the chapters discusses the difficulty of conducting statistical studies in pre-hospital settings. This was my first attempt at a non-technical book. I hope you take a look. (Enter "10-8 awaiting crew" or "Ron Cody" in Amazon search to find the book.) Drop me an email with your thoughts at ron.cody@gmail.com.

Using SAS to estimate the link between ozone and asthma (and a neat trick) was published on SAS Users.

SAS 中使用 Compress 函数删除或保留字符串中特定字符

 Character, Compress, Function, String  SAS 中使用 Compress 函数删除或保留字符串中特定字符已关闭评论
10月 152010
 

之前曾介绍过字符串连接中如何通过 trim(), strip() 等函数来删除字符串之前或之后的空格,但此类函数并不能处理字符串中间的空格。SAS 中的 compress() 函数主要用来删除字符串中的特定字符,不少 SAS 用户都以为这个函数只是用来删除空格,其实 compress() 函数的功能远不止如此,它可以删除或保留字符串中的特定字符。本文就几个小例子,介绍一下该函数的具体用法。

1. compress() 函数的基本形式

Syntax

compress (<source><, chars><, modifiers>)

Arguments

source: 指定一个字符串来源
chars: 指定要删除或者保留的字符列表,需用引号
modifiers: 指定修饰符,不区分大小写,用来控制 compress 函数,常用的修饰符及意义见本文的最后部分

2. compress() 函数应用举例

已有数据集 have:

data have;
	input char $20.;
cards;
Elek dot Me
Yuewei-Liu
HOME-027-8765 4321
;
run;

例1. 删除空格:可以直接省去第二和第三个 Arguments,也可以明确将空格加入到字符串列表中,也就是第二个 Argument。

data test;
	set have;
	char1=compress(char);
run;
data test;
	set have;
	char1=compress(char," ");
run;

例2. 使用修饰符删除小写字母:将修饰符设定为”l”,代表 lowcase,即将所有的小写字母加入到要删除的字符列表中;如不用修饰符”l”,也可以直接把所有a-z的小写字母列入要删除的字符串列表当中,效果一样,但显然前者比较简单;本例可以将所有小写字母和大写的”E”从指定的字符串中删除。

data test;
	set have;
	char1=compress(char,"E","l");
run;

data test;
	set have;
	char1=compress(char,"abcdefghijklmnopqrstuvwxyzE");
run;

例3. 保留指定的字符:字符列表的定义与前面类似,只需将”K”或”k”写入修饰符,或者在字符串列表中加入所有数字。本例为保留所有数字和”HOME“中的字符。

data test;
	set have;
	char1=compress(char,"HOME","dk");
run;

data test;
	set have;
	char1=compress(char,"HOME1234567890","k");
run;

附:常用的修饰符及其意义

a/A  所有拉丁字母,包括 a-z A-Z
d/D  所有数字
f/F  下划线和所有拉丁字母
i/G  忽略要删除或保留字符的大小写
k/K  保留字符串列表中的字符
l/L  所有小写拉丁字母
n/N  下划线,数字和所有拉丁字母
s/S  定位符,如空格、tab等
t/T  去掉第一和第二个 Arguments 里的尾部空格
u/U  所有大写拉丁字母



用bShare分享或收藏本文


您可能也喜欢:

SAS 字符串连接方法

SAS 中处理大小写的几个函数

SAS 练手: 根据多个变量的值删除特定观测

SAS 读取含多个分隔符的数据

SAS 中使用中文变量名
来自无觅网络的相关文章:

○ 删除23张 (@cy8robin)

告诉大家一个无敌删除命令,任意无法删除的文件都能删除[删除顽固文件|病毒] (@lffly)

the function (@tytm)

auxiliary function (@tytm)

function gets the bigg (@tytm)
无觅
Related Posts

SAS 字符串连接方法

 Cat, Character, Function, String  SAS 字符串连接方法已关闭评论
10月 032010
 

字符串连接是字符型数据处理中的一个十分重要的步骤,SAS 提供了十分灵活的字符串连接方法,如字符串连接符和一系列的函数等。本文就自己的一些体会,简单分享一点 SAS 字符串连接方法的一些小经验。

1. 字符串连接符 连接符的表示方式有:|| ¦¦ !! 等,一般要结合空格处理函数来使用,如 trim(), left(), strip() 等,才能达到最终的目的。

data have;
	x1="A B";
	x2="C ";
	x3="D   E";
	x4=" F";
	char1=x1||x2||x3||x4;
	char2=trim(x1)||trim(x2)||trim(x3)||trim(x4);
	char3=left(x1)||left(x2)||left(x3)||left(x4);
	char4=trim(left(x1))||trim(left(x2))||trim(left(x3))
			||trim(left(x4));
	char5=strip(x1)||strip(x2)||strip(x3)||strip(x4);
	char6=strip(x1)||"|"||strip(x2)||"|"||strip(x3)||"|"||strip(x4)||"|";
	put char1=/char2=/char3=/char4=/char5=/char6;
run;

char1=A BC D   E F
char2=A BCD   E F
char3=A BC D   EF
char4=A BCD   EF
char5=A BCD   EF
char6=A B|C|D   E|F|

在这个例子中,涉及到几个函数,函数形式都是 Function (argument), argument为欲处理的字符串或变量名,几个函数可以根据自身特点结合使用,如trim(left())结合使用,跟strip()效果是类似的:

  • trim() – 去掉字符串尾部空格,如果字符串为空,则返回一个空格
  • left() – 把字符串开头的空格移到尾部
  • strip() – 去掉字符串开头和结尾的所有空格

2. Cat 系列函数 包括 cat(), catt(), cats(), catx() 等

  • cat() – 连接字符串,与连接符作用类似,保留首尾全部空格
  • catt() – 连接字符串,但是连接之前会去掉各字符串尾部空格,相当于连接符结合trim()使用
  • cats() – 连接字符串,但是连接之前会去掉首尾全部空格,相当于连接符结合strip()使用
  • catx() – 连接字符串,但是连接之前会去掉首尾全部空格,并且在字符串之间加上一个指定的字符串(分隔符)
data have1;
	x1="A B";
	x2="C ";
	x3="D   E";
	x4=" F";
	char1=cat(x1,x2,x3,x4);
	char2=catt(x1,x2,x3,x4);
	char4=cats(x1,x2,x3,x4);
	char6=catx("|",x1,x2,x3,x4);
	put char1=/char2=/char4=/char6=;
run;

char1=A BC D   E F
char2=A BCD   E F
char4=A BCD   EF
char6=A B|C|D   E|F

上例中,char1, char2, char4和char6与 data have; 中的对应变量,效果是一样的,但是语句要显得简单的多,尤其是有多个变量的时候。如果要处理系列变量,cat系列函数还提供了更为方便的写法,如下例中的char1, char2, char4和char6的写法,都是可以的:

data have2;
	x1="A B";
	x2="C ";
	x3="D   E";
	x4=" F";
	array tmp[*] x1-x4;
	char1=cat(of x1-x4);
	char2=catt(of x1-character-x4);
	char4=cats(of x1--x4);
	char6=catx("|",of tmp[*]);
	put char1=/char2=/char4=/char6=;
run;

char1=A BC D   E F
char2=A BCD   E F
char4=A BCD   EF
char6=A B|C|D   E|F

Cat 系列函数是 SAS 里使用频率较高,而且用法非常灵活的一组函数,本人十分喜欢,如 Get specific variable name according to observation in SAS 中的 catx() ,结合data step 和本身特点,来实现字符串的连接,以达到最终目的。

时间仓促,还有一些细节没有写进来,欢迎感兴趣的朋友一起讨论。


用bShare分享或收藏本文


您可能也喜欢:

SAS 中使用 Compress 函数删除或保留字符串中特定字符

Get specific variable name according to observation in SAS

SAS 练手: 根据要求查找缺失值

Calculate Moving Average Using Proc Expand in SAS

SAS 中使用中文变量名
来自无觅网络的相关文章:

2011一汽大众全车系广告曲:This Is Love — MoZella 歌曲试听,可供QQ空间链接的地址 (@17kuaile)

仙剑奇侠传5电脑桌面主题 chinese-paladin5 (@17kuaile)

也许你不是,同性恋,但是, 你应该也会有一个特别在意的同性吧 (@17kuaile)

等你爱我。 (@17kuaile)

海盗王黑胡子佩剑镀金剑柄(组图) (@17kuaile)
无觅
Related Posts