SAS provides an extensive set of tools for data cleansing and preparation – transforming data to a shape suitable for analysis, text mining, reporting, modeling and ultimately decision making.
In this post we will cover one of the common tasks of character data manipulation – inserting a substring into a SAS character string.
A diagram below illustrates what we are going to achieve:
SAS character strings come in two different incarnations: character variables and macro variables. Since these two are quite different SAS language objects, let’s cover them one by one separately.
Inserting a substring into a character variable
Here is our task: we have a SAS character variable (string) and we want to insert in it a value of another character variable (substring) starting at a particular specified position.
Let’s say we have a string BASE in which we want to insert a COUNTRY name right before word "stays" to make different variation of the resultant phrase. Here is an example of how this can be easily done:
data COUNTRIES; length COUNTRY $20; input COUNTRY; datalines; Spain Argentina Slovenia Romania USA Luxembourg Egypt Switzerland ; data NEW (keep=COUNTRY PHRASE); BASE = 'The rain in stays mainly in the plain'; INSPOS = find(BASE,'stays'); set COUNTRIES; length PHRASE $50; PHRASE = catx(' ',substr(BASE,1,INSPOS-1),COUNTRY,substr(BASE,INSPOS)); run;
This code dynamically creates variable PHRASE out of values of variable BASE and the values of variable COUNTRY, thus making it data-driven.
After this code runs, the data set NEW will look like this:
Here are the code highlights:
- We use
- We use
- substr(BASE,INSPOS) captures the second part of the BASE (after insertion): substring of BASE starting from the position INSPOS till the end of BASE value (since the third argument, length, is not specified).
- We use
Inserting a substring into a SAS macro variable
Let’s solve a similar task, but now instead of SAS variables we will operate with SAS macro variables, since they are strings too.
Here is our problem to solve: we have a SAS macro variable (string) and we want to insert in it a value of another macro variable (substring) starting at a particular specified position.
Let’s say we have a macro variable BASE with value of The rain in stays mainly in the plain in which we want to insert a country name defined by macro variable COUNTRY with value of Spain right before word stays. Here is an example of how this can be done:
%let BASE = The rain in stays mainly in the plain; %let COUNTRY = Spain; %let W = stays; %let INSPOS = %index(&BASE,&W); %let PHRASE = %substr(&BASE,1,%eval(&INSPOS-1))&COUNTRY %substr(&BASE,&INSPOS); %put ***&PHRASE***;
This code will insert the country name in the appropriate place within the BASE macro variable which will be printed in the SAS log by %put statement:
***The rain in Spain stays mainly in the plain***
Here are the code highlights:
- %substr() macro function to extract two parts of its first argument (&BASE) - before and after insertion:
- %substr(&BASE,1,%eval(&INSPOS-1))captures the first part of &BASE (before insertion): substring of &BASE starting from the position 1 with a length of %eval(&INSPOS-1).
- %substr(&BASE,&INSPOS) captures the second part of &BASE (after insertion): substring of &BASE starting from the position &INSPOS till the end of &BASE (since the third argument is not specified).
- In case of macro variables, we don’t need any concatenation functions – we just list the component pieces of the macro variable value in a proper order with desired separators (blanks in this case).
NOTE: Unlike for SAS variables, you don’t need to assign the length of SAS macro variables which are automatically defined by their assigned values. The maximum length of SAS macro variables is 65,534 bytes.
Inserting multiple instances of a substring into a SAS character string
Sometimes you need to insert a substring into several places (positions p1, p2, …, pn) of a character string. In this case you can use the above strategy repeatedly or iteratively with one little caveat: start inserting from the highest position and moving backwards to the lowest position. This will preserve your pre-determined positions because positions are counted from left to right and inserting a substring at a higher position won’t change the lower position number. Otherwise, after insertion of a substring into lower position, all your higher positions will shift by the length of the inserted substring.
Additional Resources for SAS character strings processing
- Removing repeated characters in SAS strings
- How to unquote SAS character variable values
- Expanding lengths of all character variables in SAS data sets
- Finding n-th instance of a substring within a string
Have you found this blog post useful? Please share your thoughts and feedback in the comments section below.