10月 232021
 

Rijkswaterstaat (RWS) is the Netherlands main agency for design, construction, management and maintenance for waterways and infrastructure. Their mission is to promote safety, mobility and quality of life in the Netherlands. They are the masterminds behind some of the most prestigious water projects in the world. In a recent panel [...]

A conversation with Rijkswaterstaat: How SAS is helping keep the Netherlands safe was published on SAS Voices by Olivia Ojeda

10月 212021
 

Companies look to the SAS Cloud and SAS Managed Application Services experts to ensure improved time to value for their analytic solutions. A large part of that value is keeping your data safe and secure in the SAS Cloud because let’s face it: Analytic solutions are only as good as [...]

Keeping your data safe and secure in the SAS Cloud was published on SAS Voices by Lindsay Marshall

10月 202021
 
As defined in the SAS® 9.4 Stored Processes: Developer's Guide, Third Edition, a stored process "is a SAS program that is stored on a server and defined in metadata, and which can be executed as requested by client applications." One of the benefits of using stored processes is that client applications always have the latest version of the code. In addition, stored processes provide enhanced security and application integrity.

Have you ever submitted a stored process, and instead of the expected output, you saw errors or no output at all? Depending on how you submit the stored process, various logs are available to assist you with debugging.

This article provides guidance for understanding which situations call for which logs, where to find each log, and what you should look for in each log. The article uses two clients as examples: SAS® Enterprise Guide® and the SAS® 9.4 Stored Process Web Application.

The article is divided into two sections:

  • Frequently Used Logs
  • Infrequently Used Logs

Frequently Used Logs

The logs that are described in this section are the most prevalent logs that you request when a stored process does not execute properly or when there seems to be a problem with the stored-process server.

SAS® Object Spawner log

What

The SAS Object Spawner log records anytime that the spawner tries to start or stop a stored-process server, a workspace server, or a pooled workspace server. It also records when a request is redirected to a running stored-process server or a pooled workspace server or to a SAS® Grid Manager server.

The syntax for this log's name is ObjectSpawner_yyyy-mm-dd_machine-name_process-ID.log.

Where

The default locations for the object-spawner log are as follows:

  • Microsoft Windows operating environments: SAS-configuration-directory\Lev1\ObjectSpawner\Logs
  • UNIX operating environments: SAS-configuration-directory/Lev1/ObjectSpawner/Logs

If the logs are not in the directories that are listed above, you can check the logconfig.xml file that resides in the SAS-configuration-directory\Lev1\ObjectSpawner\ directory. That file contains a parameter called FileNamePattern that determines the location of the object-spawner log, as shown in this example:

<param name="FileNamePattern" value="/sas/config/Lev1/ObjectSpawner/Logs/ObjectSpawner_%d_%S{hostname}_%S{pid}.log"/>

Why

Here are some reasons why you should check the object-spawner log:

  1. Performance problems occur within stored processes.
  2. Servers do not start or servers stop working. The stored process server, the workspace server, and the pooled workspace server are all started by the object spawner.
  3. Running a stored process returns no results and no errors.
  4. Users do not have permission to start the server or do not have ReadMetadata permission on their application server context (for example, the SASApp application server).
  5. The connecting user has to wait longer than the availability time-out of 60 seconds, and the stored process fails.
  6. A server is not available on which to run a stored process.

The following error is an example of one that you might find in this log:

The launch of the server process failed due to a problem with the processing of the SAS logging facility configuration file (LOGCONFIGLOC).

This error might occur because the file that is specified for the -LOGCONFIGLOC option cannot be processed. The option either is invalid or it cannot be accessed.

For more examples of errors that you might see in the object-spawner log, see the section Object Spawner Messages in "Appendix 1: Object Spawner and SAS OLAP Server Messages" in the SAS® 9.4 Intelligence Platform: Application Server Administration Guide.

You can also enable more detailed logging within the object-spawner log by following the steps in Enable More Detailed Logging for SAS Object Spawner Troubleshooting in "Chapter 10, Administering Logging for SAS Servers" of the SAS® 9.4 Intelligence Platform: System Administration Guide, Fourth Edition.

Example and possible cause

Let’s look at an example of how the object-spawner log can be helpful.

Suppose that the stored-process server fails to validate. Within the object-spawner log, you might see an error like the following:

Error authenticating user domain\sassrv in function LogonUser. Error 1326 (The user name or password is incorrect. ).
 
The credentials specified for the SASApp – Stored Process Server (A5WN99NR.AZ000007) server definition failed to authenticate. Therefore this server definition will not be included.

With this type of error, you might have an invalid sassrv password. You can check the password by trying to log on to the server directly with the sassrv user ID. If the logon is successful, open SAS® Management Console and verify that the password is correct in the metadata, as follows:

  1. In SAS® Management Console, select User Manager.
  2. Locate the SAS General Servers group. Then, right-click and select Properties.
  3. Click the Accounts tab.
  4. Select the sassrv user ID. Then click Edit.
  5. Update the password to what was successful on the operating system.
  6. Restart the object spawner.
  7. Check the logs again to verify that the error is resolved.

Object-spawner console log

What

The object-spawner console log, available only on UNIX platforms, does not actually contain information about the object spawner. Instead, this log contains STDERR and STDOUT messages about the applications that are launched by the object spawner. The syntax for this log's name ObjectSpawner_console_machine-name.log.

Where

Under UNIX, the default location for the object-spawner console log is SAS-configuration-directory/Lev1/ObjectSpawner/Logs.

Why

Here are some reasons why you should check this log:

  • SAS modules are missing. The program tries to run SAS modules that are not present in your environment.
  • Possible memory issues occur.
  • Storage or space issues occur.
  • Possible encoding issues occur.
  • Library permission issues occur.

The following error is an example of one that you might find in the object-spawner console log:

ERROR: Could not find extension: (sasgis)

This error can occur when someone or some component runs code that either checks for the existence of certain SAS modules or that contains references to certain SAS modules that are not present in your environment. Because these modules do not exist, errors are written to the log. This behavior is expected, and it is not indicative of any larger issue. So, you can ignore these messages.

Example and possible cause

Let’s look at an example of how the object-spawner console log can be helpful.

Suppose that your program generates a segmentation violation, but there is no indication as to why that happens. Within the object-spawner console log, you might see an error like the following:

ERROR: No space left on device

This type of error is an indication that you might be running out of disk space. Try adding more space, point to a network drive, or use smaller files.

SAS® Stored Process Server log

You need to enable verbose logging to see every stored process that runs through the server. For more details about verbose logging, see SAS Note 34114, "Creating a detailed SAS® Stored Process Server log by default."

What

The syntax for the SAS Stored Process Server log name is SASApp_STPServer_yyyy-mm-dd_machine-name_process-ID.log.

Where

The default locations for the stored-process server log are as follows:

  • Windows: SAS-configuration-directory\Lev1\SASApp\StoredProcessServer\Logs
  • UNIX: SAS-configuration-directory/Lev1/SASApp/StoredProcessServer/Logs

Why

Here are the reasons that you might need to check this log:

  • Performance problems occur.
  • You do not receive any results. That is, the stored process runs, but it does not return an error, a warning, or the expected results.

Here is an example of an error that you might see in the log:

ERROR: A lock is not available for library.dataset.data
ERROR: Lock held by process 26834

This error indicates that the data set that is shown in the error cannot be locked for processing because the lock is held by another process. Further investigation of the stored-process server log that contains process ID 26834 shows the DATA step that is also reading from the same data set, which causes the lock.

Here is another error that you might see in this log:

ERROR: No logical assign for filename _WEBOUT.

Possible causes

The two main reasons for this error are as follows:

  • The %STPBEGIN and %STPEND macros are enabled.
  • The item Package is selected under the Result capabilities section (via Execution Options ► Result capabilities in either SAS Management Console or SAS Enterprise Guide)

The object-spawner console log is often needed in conjunction with the object-spawner log so that you can evaluate the communication between them.

SAS® Stored Process web-application log

What

The SAS Stored Process web-application log resides on the middle-tier machine, and the syntax for the log's name is SASStoredProcess9.4.log. (Note: The 9.4 in SASStoredProcess9.4.log will change with new releases.)

Where

The default locations for this log are as follows:

  • Windows: SAS-configuration-directory\Lev1\Web\Logs\SASServer1_1
  • UNIX: SAS-configuration-directory/Lev1/Web/Logs/SASServer1_1

Why

Here are some scenarios for which you might want to check this log:

  • HTTP errors are displayed in the browser when you submit the stored process from the SAS Stored Process Web Application.
  • Pages in the stored-process web application (for example, the welcome page, index page, or custom input form) do not load or they take a long time to load.
  • A dynamic prompt cannot load.

Examples

The following error occurs when dynamic prompts from a data set and the SAS General Servers user group are denied ReadMetadata permission in SAS Management Console for that specific stored process:

2021-07-30 15:01:43,494 [tomcat-http--10] WARN [sasdemo] com.sas.prompts.valueprovider.dynamic.workspace.PromptColumnValueProvider - Cannot resolve data type
2021-07-30 15:01:43,505 [tomcat-http--10] ERROR [sasdemo] com.sas.prompts.valueprovider.dynamic.workspace.PromptColumnValueProvider - Unable to find physical table
com.sas.storage.exception.ServerConnectionException: Unable to find physical table
	at com.sas.prompts.valueprovider.dynamic.DataProviderUtil.getLibrary(DataProviderUtil.java:163)
	at com.sas.prompts.valueprovider.dynamic.workspace.PromptColumnValueProvider.setConnection(PromptColumnValueProvider.java:815)
	at com.sas.prompts.valueprovider.dynamic.workspace.PromptColumnValueProvider.getValuesAsList(PromptColumnValueProvider.java:647)
	at com.sas.prompts.valueprovider.dynamic.workspace.PromptColumnValueProvider.getValues(PromptColumnValueProvider.java:633)

The following pop-up message occurs because the SAS General Servers user group or the sasdemo user ID are denied permission to the data set that is needed for the prompts for a stored process.

Unable to execute query: SQL passthru expression contained these errors: ERROR: File MYLIB.MYCLASS.DATA does not exist

When you run a stored process from the SAS Stored Process Web Application, it is often helpful to add debugging options to the end of the URL by using the reserved macro variable _DEBUG. The following example URL demonstrates how to use the TRACE and LOG options to obtain pertinent information in the log that is produced in the browser.

http://your.web.server:8080/SASStoredProcess/do?_program=/STP_Examples/test1&_debug=trace,log

For a complete list of _DEBUG= values, see List of Valid Debugging Keywords in "Chapter 7: Building a Web Application with SAS® Stored Processes" in the SAS® 9.4 Stored Processes: Developer's Guide, Third Edition.

Workspace-server log

What

You must request logging for the workspace-server log because it is not one that is enabled, by default. The syntax for the log's name is SASApp_WorkspaceServer_yyyy-mm-dd_machine-name_process-ID.log.

Where

As mentioned earlier, you must request logging for this log, as follows:

  • Windows: In the sasv9_usermods.cfg file that resides in SAS-configuration-directory\Lev1\SASApp\WorkspaceServer\ directory, add the following command to turn on logging:
    -logconfigloc SAS-configuration-directory\Lev1\SASApp\WorkspaceServer\logconfig.trace.xml"

    This setting generates the log file in SAS-configuration-directory\Lev1\SASApp\WorkspaceServer\Logs.

  • UNIX: In the sasv9_usermods.cfg file that resides in SAS-configuration-directory/Lev1/SASApp/WorkspaceServer/logconfig.trace.xml"
    to /Lev1/SASApp/WorkspaceServer/
    directory, add the following command to turn on logging:

    -LOGCONFIGLOC SAS-configuration-directory/Lev1/SASApp/WorkspaceServer/logconfig.trace.xml"

    This setting generates the log file that resides in SAS-configuration-directory/Lev1/SASApp/WorkspaceServer/Logs.

Why

If you run the stored process from the SAS Stored Process Web Application with the server set to the workspace server, you need to consult this log for any errors or warnings. Running the stored process from SAS Enterprise Guide produces a log directly in the application. You need to consult the workspace server log in either of these circumstances:

  • when Default server is selected as the server type in the stored-process properties
  • when you submit the stored process from SAS Enterprise Guide

Pooled workspace-server Log

What

The syntax for the pooled workspace-server log's name is SASApp_PooledWSServer_yyyy-mm-dd_machine-name_process-ID.log.

Where

The default locations for this log are as follows:

  • Windows: SAS-configuration-directory\ Lev1\SASApp\PooledWorkspaceServer\Logs
  • UNIX: SAS-configuration-directory/Lev1/SASApp/PooledWorkspaceServer/Logs

Why

The pooled workspace server loads dynamic prompt values in the SAS Stored Process web application. So, you need this log if a dynamic prompt does not load.

SAS® Metadata Server Log

What

The syntax for the SAS Metadata Server log' name is SASMeta_MetadataServer_yyyy-mm-dd_machine-name_process-ID.log

Where

The default locations for the metadata-server log are as follows:

  • Windows: SAS-configuration-directory\Lev1\SASMeta\MetadataServer\Logs
  • UNIX: SAS-configuration-directory/Lev1/SASMeta/MetadataServer/Logs

Why

This log is helpful for issues with the LIBNAME META engine and metadata permissions.

The following error is written to the metadata-server log when no metadata identity is associated with the user name and password.

ERROR [00000779] 28:billyw - User Folders cannot be created or retrieved if connected user ID is not a person identity. Must be connected as a person identity or valid person name must be passed in request.

The following messages might also be written to this log when you try to open SAS Management Console and when the user billyw requires log on as a batch job user rights at the operating system level.

20100122:13.00.36.82: 00002410:NOTE:    User tbanks probably does not have the right to log on as a batch job.
20100122:13.00.36.82: 00002410:ERROR:   Error authenticating user tbanks in function LogonUser.  Error 1385 (Logon failure: the user has not been granted the requested logon type at this computer. ).
20100122:13.00.36.82: 00002410:ERROR:   Access denied.

Infrequently Used Logs

Windows Event Viewer log

The Windows Event Viewer contains several logs that are used less frequently than the aforementioned logs. This fact in no way diminishes their effectiveness when you are troubleshooting stored-process issues. SAS Technical Support will request these logs from you for an issue if it is necessary.

Although the event viewer contains several logs, SAS Technical Support reviews only the application log, the system log, and the security log for errors that are related to start-up and other failures in the SAS Stored Process Server (or any SAS server).

If the sassrv user ID cannot write to the Logs directory, you receive a message in this log.

If a server is stopped or started, those behaviors register in these logs. Hopefully, this review of the various logs that are related to debugging stored-process issues will assist your troubleshooting efforts.

ERROR_yyyy-mm-dd-time.log

What

This web-server log is useful when you have catastrophic failures in loading or running a stored process in a SAS web application. The syntax for this log's name is ERROR_yyyy-mm-dd-time.log.

Where

The default locations for this log are as follows:

  • Windows: SAS-configuration-directory\Lev1\Web\WebServer\logs
  • UNIX: SAS-configuration-directory/Lev1/Web/WebServer/logs

Why

Here are some reasons why you should check this log:

  • Java errors are returned in the browser when you try to load or run a stored process using the SAS Stored Process Web Application.
  • Connection errors occur between the web server and the tc Server.

Here is an example of an error that you might find in this log:

[error] (OS 10061)No connection could be made because the target machine actively refused it.  : proxy: HTTP: attempt to connect to 192.168.x.xxx:8080 (machine-name) failed

This error might occur when your servers are down or when they are restarting and are not active yet.

Example and possible cause

Let’s look at an example of how the ERROR_yyyy-mm-dd-time log can be helpful.

Suppose that you try to run a stored process through the SAS Stored Process Web Application and you cannot load any SAS 9.4 web applications after you enter your user ID and password. If you check this log, you might see errors like the following:

No connection could be made because the target machine actively refused it. : proxy: HTTP: attempt to connect to 192.168.x.xxx:8080 (machine-name) failed
 
ap_proxy_connect_backend disabling worker for (machine-name)

The secondary middle-tier node is in a clustered environment, but it configured incorrectly. For a circumvention, see SAS Note 55904, "You cannot access any SAS® 9.4 web applications when the secondary middle-tier node is in a clustered environment."

Localhost_access_log..yyyy-mm-dd.txt

What

This log shows the sequence and query strings of URLs that are submitted through the SAS Web Application Server (for all web applications that use this application server). This log lists activity sequentially and includes HTTP status codes for each URL. The syntax for this log's name is Localhost_access_log..yyyy-mm-dd.txt.

Where

The default locations for this log are as follows:

  • Windows: SAS-configuration-directory\Lev1\Web\WebAppServer\SASServer1_1\logs
  • UNIX: SAS-configuration-directory/Lev1/Web/WebAppServer/SASServer1_1/logs

Why

Here are some reasons why you should check this log:

  • A page cannot load or the URL is invalid.
  • Performance problems occur.
  • You need a good way to trace the stored-process activity after you click the Run button in web applications only.

Here is an example of a message that you might see in this log:

"POST /SASLogon/v1/tickets HTTP/1.1" 404 796

Example and possible cause

Let’s look at an example of how this log can be helpful. Suppose that you cannot log on to the SAS Stored Process Web Application. If you check this log, you might see a message like the following:

GET /SASStoredProcess/j_spring_cas_security_check?ticket=ST-9994-zpD2sFdBmayXEuoj4cya-cas HTTP/1.1" 401 802

In this example, you can see that the issue is a 401 HTTP return code.

If the servers have just been restarted, try waiting a little longer for them to start up. If that is not the issue, the server might be down. The best approach is to stop and then restart the servers. Because of dependencies, it is important to start the servers in the correct order. You can find the correct order in Overview of Server Operation, in "Chapter 6: Operating Your Servers" of the SAS® Intelligence Platform: System Administration Guide, Fourth Edition.

Access_yyyy-mm-dd-time.log

What

This log shows the sequence and query strings of URLs that are submitted through the SAS Web Server. The syntax for the name of this log is access_yyyy-mm-dd-time.log.

Where

The default locations for this log are as follows:

  • Windows: SAS-configuration-directory\Lev1\Web\WebServer\logs
  • UNIX: SAS-configuration-directory/Lev1/Web/WebServer/logs

Why

Here are some reasons why you would check this log:

  • A connection error occurs when you submit a stored process.
  • Java errors occur in the browser.
  • When you evaluate a performance problem, timestamps in the log confirm when the web server received the request and what was in the request.
  • The output that is produced by the stored process does not match the expected output based on the prompt selections that are available when you submit the stored process.

Here is an example of a message that you might see in this log:

"GET /SASTheme_default/themes/ThemeXMLFiles.config HTTP/1.1" 503 299

This error might occur because the servers are down or are in the process of restarting and are not active yet.

Example and possible cause

Let’s look at an example of how the access_yyyy-mm-dd-time log can be helpful.

Suppose that you are trying to run a stored process through the SAS web application and you receive a connection error. If you check this log, you might see a message like the following:

"GET /SASLogon/proxy?pgt=TGT-6-aGakal9b2VGBg3dnNTbbwiVALOWqBSem9E3cVWehDD35vegmxI-cas&targetService=http%3A%2F%2FLazySecurityContext HTTP/1.1" 502 407

The message above shows a 502 HTTP return code. This error indicates that the server, while acting as a gateway or proxy, received an invalid response from the upstream server. If the servers were just restarted, try waiting a little longer for them to start back up. If that is not the issue, then a server might be down. The best approach is to stop and restart the servers. Because of dependencies, it is important to start the servers in the correct order. You can find the correct order in Overview of Server Operation, in "Chapter 6: Operating Your Servers" of the SAS® Intelligence Platform: System Administration Guide, Fourth Edition.

Server.log

What

This log is helpful if a stored-process web application generates an HTPP, browser, or generic error in the browser when it loads a page or results that are returned from a stored process.

Where

The default locations for this log are as follows:

  • Windows: SAS-configuration-directory\Lev1\Web\WebAppServer\SASServer1_1\logs
  • UNIX: SAS-configuration-directory/Lev1/Web/WebAppServer/SASServer1_1/logs

Why

Here are some reasons why you would check this log:

  • The SAS Stored Processes web application is not working.
  • Requests time out.
  • A configuration change is made to one of the web applications (for example, an increase to the time-out for the stored-process web application).
  • A generic error is displayed in the SAS Stored Processes web application when you run a stored process.

Here is an example of a message that you might find in this log:

java.io.IOException: Cannot bind to URL

One possible reason for this type of error is network issues at the site where the stored process is run.

Example and possible cause

Let’s look at an example of how the server.log file can be helpful.

Suppose that you want to run a stored process through the SAS Web Application and you receive a generic error that says The system is experiencing problems. Please contact your system administrator. If you check this log, you might see a message like the following:

ERROR (ContainerBackgroundProcessor[StandardEngine[Catalina]]) [org.apache.catalina.core.ContainerBase] Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
java.lang.OutOfMemoryError: Java heap space

There is insufficient memory available to process the request. As a possible workaround, you need to modify the setenv.bat file that resides on Windows in SAS-configuration-directory\Lev1\Web\Webappserver\Sasserver1_1\bin\ or the setenv.sh in SAS-configuration-directory/Lev1/Web/WebAppServer/Sasserver1_1/bin/ on UNIX. You need to edit the JVM_OPTS value by changing the -Xmx4096m -Xms1024m parameter to the following:

-Xmx8092m -Xms8092m

Making this change requires the restart of the SAS Web Application Server if only the webappserver script is updated.

Conclusion

Hopefully, this review of logs that are related to debugging stored-process issues will assist your troubleshooting efforts. When you request help from SAS Technical Support, these are some of the logs that you will be asked to send so that Technical Support can better determine the cause of your problem.

Learn more

Debugging a stored-process problem was published on SAS Users.

10月 202021
 

A genetic algorithm (GA) is a heuristic optimization technique. The method tries to mimic natural selection and evolution by starting with a population of random candidates. Candidates are evaluated for "fitness" by plugging them into the objective function. The characteristics of the better candidates are combined to create a new set of candidates. Some of the new candidates experience mutations. Eventually, over many generations, a GA can produce candidates that approximately solve the optimization problem. This article uses SAS/IML to implement a genetic algorithm in SAS.

A previous article discusses the mutation and crossover operators, which are important in implementing a genetic algorithm. In previous articles, the solution vectors were represented by column vectors. In the GA routines, candidates are row vectors. The population is a matrix, where each row represents an individual.

A GA is best illustrated by using an example. This article uses the binary knapsack problem. In the knapsack problem, a knapsack can hold W kilograms. There are N objects, each with a different value and weight. You want to maximize the value of the objects you put into the knapsack without exceeding the weight. A solution to the knapsack problem is a 0/1 binary vector b. If b[i]=1, the i_th object is in the knapsack; if b[i]=0, it is not. Although the knapsack problem can be solved by using a constrained linear program, this article solves an unconstrained problem and an objective function that penalizes a candidate if it exceeds the weight limit.

The main steps in a genetic algorithm

The SAS/IML User's Guide provides an overview of genetic algorithms. The five main steps follow:

  • Encoding: Each potential solution is represented as a chromosome, which is a vector of values. For the knapsack problem, each chromosome is an N-dimensional vector of binary values.
  • Fitness: Choose a function to assess the fitness of each candidate chromosome. This is usually the objective function for unconstrained problems, or a penalized objective function for problems that have constraints. The fitness of a candidate determines the probability that it will contribute its genes to the next generation of candidates.
  • Selection: Choose which candidates become parents to the next generation of candidates.
  • Crossover (Reproduction): Choose how to produce children from parents.
  • Mutation: Choose how to randomly mutate some children to introduce additional diversity.

Encoding a solution vector

Each potential solution is represented as an N-dimensional vector of values. For the knapsack problem, you can choose a binary vector. In SAS/IML, you use the GASETUP function to define the encoding for a problem. The SAS/IML language supports four different encodings. The knapsack problem can be encoded by using an integer vector, as follows:

/* Individuals are ROW vectors. Population is a matrix of stacked rows. */
proc iml;
call randseed(12345);
/* Solve the knapsack problem: max Value*b  subject to Weight*b <= WtLimit */
/* Item:  1 2 3 4 5   6   7   8   9  10  11  12  13  14  15  16  17 */
Weight = {2 3 4 4 1.5 1.5 1   1   1   1   1   1   1   1   1   1   1}`;
Value  = {6 6 6 5 3.1 3.0 1.5 1.3 1.2 1.1 1.0 1.1 1.0 1.0 0.9 0.8 0.6}`;
WtLimit = 9;                                 /* weight limit */
N = nrow(Weight);
 
/* set up an encoding for the GA */
id = gaSetup(2,                 /* 2-> integer vector encoding */
             nrow(weight),      /* size of vector */
             123);              /* internal seed for GA */

The GASETUP function returns an identifier for the problem. This identifier must be used as the first argument to subsequent calls to GA routines. It is possible to have a program that runs several GAs, each with its own identifier. The GASETUP call specifies that candidate vectors are integer vectors. Later in the program, you can tell the GA that they are binary vectors.

Fitness, mutation, and reproduction

In previous articles, I discussed fitness, mutation, and crossover functions:

The following SAS/IML statements define the fitness module (ObjFun), the mutation module (Mutate), and the crossover module (Cross) and register these modules with the GA system.

/* just a few of the many hyperparameters */
lambda = 100;       /* factor to penalize exceeding the weight limit */
ProbMut= 0.2;       /* probability that the i_th site is mutated */
ProbCross = 0.3;    /* children based on 30%-70% split of parents */
 
/* b is a binary column vector */
start ObjFun( b ) global(Weight, Value, WtLimit, lambda);
   wsum = b * Weight;
   val  = b * Value;
   if wsum > WtLimit then                      /* penalize if weight limit exceeded */
      val = val - lambda*(wsum - WtLimit)##2;  /* subtract b/c we want to maximize value */
   return(val);
finish;
 
/* Mutation operator for a binary vector, b. */
start Mutate(b) global(ProbMut);
   N = ncol(b);
   k = max(1, randfun(1, "Binomial", ProbMut, N)); /* how many sites? */
   j = sample(1:N, k, "NoReplace");                /* choose random elements */
   b[j] = ^b[j];                                   /* mutate these sites */
finish;
 
/* Crossover operator for a a pair of parents. */
start Cross(child1, child2, parent1, parent2) global(ProbCross);
   b = j(ncol(parent1), 1);
   call randgen(b, "Bernoulli", ProbCross); /* 0/1 vector */
   idx = loc(b=1);                          /* locations to cross */
   child1 = parent1;
   child2 = parent2;
   if ncol(idx)>0 then do;                  /* exchange values */
      child1[idx] = parent2[idx];
      child2[idx] = parent1[idx];
   end;
finish;
 
/* register these modules so the GA can call them as needed */
call gaSetObj(id, 1, "ObjFun"); /* 1->maximize objective module */
call gaSetCro(id,
              1.0,              /* hyperparameter: crossover probability */
              0, "Cross");      /* user-defined crossover module */
call gaSetMut(id,
              0.20,             /* hyperparameter: mutation probability */
              0, "Mutate");     /* user-defined mutation module */

Fitness and selection

A genetic algorithm pits candidates against each other according to Darwin's observation that individuals who are fit are more likely to pass on their characteristics to the next generation.

A genetic algorithm evolves the population across many generations. The individuals who are more fit are likely to "reproduce" and send their progeny to the next round. To preserve the characteristics of the very best individuals (called elites), some individuals are "cloned" and passed unchanged to the next generations. After many rounds, the population is fitter, on average, and the elite individuals are the best solutions to the objective function.

In SAS/IML, you can use the GASETSEL subroutine to specify the rules for selecting elite individuals and for are selecting which individuals are eligible to reproduce. The following call specifies that 3 elite individuals are "cloned" for the next generation. For the remaining individuals, pairs are selected at random. With 95% probability, the more fit individual is selected to reproduce:

call gaSetSel(id,
              3,                /* hyperparameter: carry k elites directly to next generation */
              1,                /* dual tournament */
              0.95);            /* best-player-wins probability */

At this point, the GA problem is fully defined.

Initial population and evolution

You can use the GAINIT subroutine to generate the initial population. Typically, the initial population is generated randomly. If there are bounds constraints on the solution vector, you can specify them. For example, the following call generates a very small population of 10 random individuals. The chromosomes are constrained to be binary by specifying a lower bound of 0 and an upper bound of 1 for each element of the integer vector.

/* example of very small population; only 10 candidates */
call gaInit(id,  10,            /* initial population size */
            j(1,  nrow(weight), 0) //   /* lower and upper bounds of binary vector */
            j(1,  nrow(weight), 1) );

You can evolve the population by calling the GAREGEN subroutine. The GAREGEN call selects individuals to reproduce according to the tournament rules. The selected individuals become parents. Pairs of parents produce children according to the crossover operation. Some children are mutated according to the mutation operator. The children become the next generation and replace the parents as the "current" population.

At any time, you can use the GAGETMEM subroutine to obtain the members of the current population. You can use the GAGETVAL subroutine to obtain the fitness scores of the current population. Let's manually call the GAREGEN subroutine a few times and use the GAGETVAL subroutine after each call to see how the population evolves:

call gaGetVal(f0, id);     /* initial generation is random */ 
call gaRegen(id);          /* create the next generation via selection, crossover, and mutation */
call gaGetVal(f1, id);     /* evaluate fitness */
call gaRegen(id);          /* continue for additional generations ...*/
call gaGetVal(f2, id);
 
/* print fitness for each generation and top candidates so far */
print f0[L="Initial Fitness"] f1[L="1st Gen Fitness"] f2[L="2nd Gen Fitness"];
call gaGetMem(best, val, id, 1:3);
print best[f=1.0 L="Best Members"], val[L="Best Values"];

The output shows how the population evolves for three generations. Initially, only one member of the population satisfies the weight constraints of the knapsack problem. The feasible solution has a positive fitness score (9.7); the infeasible solutions are negative for this problem. After the selection, crossover, and mutation operations, the next generation has two feasible solutions (9.7 and 8.1). After another round, the population has six feasible solutions and the score for the best solution has increased to 11.7. The population is becoming more fit, on average.

After the second call to GAREGEN, the GAGETMEM call gets the candidates in positions 1:3. Recall that you specified three "elite" members in the GASETSEL call. The elite members are therefore placed at the top of the population. (The remaining individuals are not sorted according to fitness.) The chromosomes for the elite members are shown. In this encoding, each chromosome is a 0/1 binary vector that determines which objects are placed in the knapsack and which are left out.

Evolution towards a solution

The previous section purposely used a small population of 10 individuals. Such a small population lacks genetic diversity and might not converge to an optimal solution (or at least not quickly). A more reasonable population contains 100 or more individuals. You can use the GAINIT call a second time to reinitialize the initial population. Now that you have experience with the GAREGEN and GAGETVAL calls, you can use those calls in a DO loop to iterate over many generations. The following statements iterate the initial population through 15 generations. For each generation, the program records the best fitness score (f[1]) and the median fitness score.

/* for genetic diversity, better to have a larger population */
call gaInit(id,  100,                   /* initial population size */
            j(1,  nrow(weight), 0) //   /* lower and upper bounds of binary vector */
            j(1,  nrow(weight), 1) );   
/* record the best and median scores for each generation */
niter = 15;
summary = j(niter,3);
summary[,1] = t(1:niter);   /* (Iteration, Best Value, Median Value) */
do i = 1 to niter;
   call gaRegen(id);
   call gaGetVal(f, id);
   summary[i,2] = f[1];   summary[i,3] = median(f);
end;
print summary[c = {"Iteration" "Best Value" "Median Value"}];

The output shows that the fitness of the best candidate is monotonic. It increases from an initial value of 16.3 to the final value of 19.6, which is the optimal value. Similarly, the median value tends to increase. Because of random mutation and the crossover operations, statistics such as the median or mean are usually not monotonic. Nevertheless, the fitness of later generations tends to be better than for earlier generations.

You can examine the chromosomes of the elite candidates in the final generation by using the GAGETMEM subroutine:

/* print the top candidates */
call gaGetMem(best, f, id, 1:3);
print best[f=1.0 L="Best Member"], f[L="Final Best Value"];

The output confirms that the best candidate is the same binary vector as was found by using a constrained linear program in a previous article.

The GA algorithm maintains an internal state, which enables you to continue iterating if the current generation is not satisfactory. In this case, the problem is solved, so you can use the GAEND subroutine to release the internal memory and resources that are associated with this GA. After you call GAEND, the identifier becomes invalid.

call gaEnd(id);   /* free the memory and internal resources for this GA */

The advantages and disadvantage of genetic algorithms

As with any tool, it is important to recognize that a GA has strengths and weaknesses. Strengths include:

  • A GA is amazingly flexible. It can be used to solve a wide variety of optimization problems.
  • A GA can provide useful suboptimal solutions. The elite members of a population might be "good enough," even if they are not optimal.

Weaknesses include:

  • A GA is dependent on random operations. If you change the random number seed, you might obtain a completely different solution or no solution at all.
  • A GA can take a long time to produce an optimal solution. It does not tell you whether a candidate is optimal, only that it is the "most fit" so far.
  • A GA requires many heuristic choices. It is not always clear how to implement the mutation and crossover operators or how to implement the tournament that selects individuals to be parents.

Summary

In summary, this article shows how to use low-level routines in SAS/IML software to implement a genetic algorithm. Genetic algorithms can solve optimization problems that are intractable for traditional mathematical optimization algorithms. Like all tools, a GA has strengths and weaknesses. By gaining experience with GAs, you can build intuition about when and how to apply this powerful method.

For other ways to use genetic algorithms in SAS, see the GA procedure in SAS/OR software and the black-box solver in PROC OPTMODEL.

The post An introduction to genetic algorithms in SAS appeared first on The DO Loop.

10月 202021
 

In this Q&A with MIT/SMR Connections, Iain Brown, SAS’s head of data science for the United Kingdom and Ireland, discusses some key risks, ethical issues, and platform questions that organizations should consider before adopting AI and takes a quick look at current and emerging AI trends. Q: In your view, [...]

Assessing AI readiness: Planning for today and tomorrow was published on SAS Voices by Kimberly Nevala

10月 192021
 

The social and economic impact of COVID-19 has dramatically affected supply chains and demand planning across all industries. Then there’s the Amazon effect, which has led to sky-high consumer expectations of the ordering and delivery process. Demand planners for retailers and consumer goods companies have quickly realized they have no [...]

What does it take to become an analytic-driven demand planning organization? was published on SAS Voices by Charlie Chase

10月 182021
 

This article uses an example to introduce to genetic algorithms (GAs) for optimization. It discusses two operators (mutation and crossover) that are important in implementing a genetic algorithm. It discusses choices that you must make when you implement these operations.

Some programmers love using genetic algorithms. Genetic algorithms are heuristic methods that can be used to solve problems that are difficult to solve by using standard discrete or calculus-based optimization methods. A genetic algorithm tries to mimic natural selection and evolution by starting with a population of random candidates. Candidates are evaluated for "fitness" by plugging them into the objective function. The better candidates are combined to create a new set of candidates. Some of the new candidates experience mutations. Eventually, over many generations, a GA can produce candidates that approximately solve the optimization problem. Randomness plays an important role. Re-running a GA with a different random number seed might produce a different solution.

Critics of genetic algorithms note two weaknesses of the method. First, you are not guaranteed to get the optimal solution. However, in practice, GAs often find an acceptable solution that is good enough to be used. The second complaint is that the user must make many heuristic choices about how to implement the GA. Critics correctly note that implementing a genetic algorithm is as much an art as it is a science. You must choose values for hyperparameters and define operators that are often based on a "feeling" that these choices might result in an acceptable solution.

This article discusses two fundamental parts of a genetic algorithm: the crossover and the mutation operators. The operations are discussed by using the binary knapsack problem as an example. In the knapsack problem, a knapsack can hold W kilograms. There are N objects, each with a different value and weight. You want to maximize the value of the objects you put into the knapsack without exceeding the weight. A solution to the knapsack problem is a 0/1 binary vector b. If b[i]=1, the i_th object is in the knapsack; if b[i]=0, it is not.

A brief overview of genetic algorithms

The SAS/IML User's Guide provides an overview of genetic algorithms. The main steps in a genetic algorithm are as follows:

  • Encoding: Each potential solution is represented as a chromosome, which is a vector of values. The values can be binary, integer-valued, or real-valued. (The values are sometimes called genes.) For the knapsack problem, each chromosome is an N-dimensional vector of binary values.
  • Fitness: Choose a function to assess the fitness of each candidate chromosome. This is usually the objective function for unconstrained problems, or a penalized objective function for problems that have constraints. The fitness of a candidate determines the probability that it will contribute its genes to the next generation of candidates.
  • Selection: Choose which candidates become parents to the next generation of candidates.
  • Crossover (Reproduction): Choose how to produce children from parents.
  • Mutation: Choose how to randomly mutate some children to introduce additional diversity.

This article discusses the crossover and the mutation operators.

The mutation operator

The mutation operator is the easiest operation to understand. In each generation, some candidates are randomly perturbed. By chance, some of the mutations might be beneficial and make the candidate more fit. Others are detrimental and make the candidate less fit.

For a binary chromosome, a mutation consists of changing the parity of some proportion of the elements. The simplest mutation operation is to always change k random elements for some hyperparameter k < N. A more realistic mutation operation is to choose the number of sites randomly according to a binomial probability distribution with hyperparameter pmut. Then k is a random variable that differs for each mutation operation.

The following SAS/IML program chooses pmut=0.2 and defines a subroutine that mutates a binary vector, b. In this example, there are N=17 items that you can put into the knapsack. The subroutine first uses the Binom(pmut, N) probability distribution to obtain a random number of sites, k, to mutate. (But if the distribution returns 0, set k=1.) The SAMPLE function then draws k random positions (without replacement), and the values in those positions are changed.

proc iml;
call randseed(12345);
N = 17;                      /* size of binary vector */
ProbMut= 0.2;                /* mutation in 20% of sites */
 
/* Mutation operator for a binary vector, b. 
   The number of mutation sites k ~ Binom(ProbMut, N), but not less than 1. 
   Randomly sample (without replacement) k sites. 
   If an item is not in knapsack, put it in; if an item is in the sack, take it out. */
start Mutate(b) global(ProbMut);
   N = nrow(b);
   k = max(1, randfun(1, "Binomial", ProbMut, N)); /* how many sites? */
   j = sample(1:N, k, "NoReplace");                /* choose random elements */
   b[j] = ^b[j];                                   /* mutate these sites */
finish;
 
Items = 5:12;                  /* choose items 5-12 */
b = j(N,1,0);  b[Items] = 1;
bOrig = b;
run Mutate(b);
print (bOrig`)[L="Original b" c=(1:N)]
      (b`)[L="Randomly Mutated b" c=(1:N)];

In this example, the original chromosome has a 1 in locations 5:12. The binomial distribution randomly decides to mutate k=4 sites. The SAMPLE function randomly chooses the locations 3, 11, 15, and 17. The parity of those sites is changed. This is seen in the output, which shows that the parity of these four sites differs between the original and the mutated b vector.

Notice that you must choose HOW the mutation operator works, and you must choose a hyperparameter that determines how many sites get mutated. The best choices depend on the problem you are trying to solve. Typically, you should choose a small value for the probability pmut so that only a few sites are mutated.

In the SAS/IML language, there are several built-in mutation operations that you can use. They are discussed in the documentation for the GASETMUT subroutine.

The crossover operator

The crossover operator is analogous to the creation of offspring through sexual reproduction. You, as the programmer, must decide how the parent chromosomes, p1 and p2, will combine to create two children, c1 and c2. There are many choices you can make. Some reasonable choices include:

  • Randomly choose a location s, 1 ≤ s ≤ N. You then split the parent chromosomes at that location and exchange and combine the left and right portions of the parents' chromosomes. One child chromosome is c1 = p1[1:s] // p2[s+1:N] and the other is c2 = p2[1:s] // p1[s+1:N]. Note that each child gets some values ("genes") from each parent.
  • Randomly choose a location s, 1 ≤ s ≤ N. Divide the first chromosome into subvectors of length s and N-s. Divide the second chromosome into subvectors of length N-s and s. Exchange the subvectors of the same length to form the child chromosomes.
  • Randomly choose k locations. Exchange the locations between parents to form the child chromosomes.

The following SAS/IML function implements the third crossover method. The method uses a hyperparameter, pcross, which is the probability that each location in the chromosome is selected. On average, about Npcross locations will be selected. In the following program, pcross = 0.3, so we expect 17(0.3)=5.1 values to be exchanged between the parent chromosomes to form the children:

start uniform_cross(child1, child2, parent1, parent2) global(ProbCross);
   b = j(nrow(parent1), 1);
   call randgen(b, "Bernoulli", ProbCross); /* 0/1 vector */
   idx = loc(b=1);                    /* locations to cross */
 
   child1 = parent1;                  /* initialize children */
   child2 = parent2;
   if ncol(idx)>0 then do;            /* exchange values */
      child1[idx] = parent2[idx];     /* child1 gets some from parent2 */
      child2[idx] = parent1[idx];     /* child2 gets some from parent1 */
   end;
finish;
 
ProbCross = 0.3;                               /* crossover 25% of sites */
Items = 5:12;   p1 = j(N,1,0);  p1[Items] = 1; /* choose items 5-12 */
Items = 10:15;  p2 = j(N,1,0);  p2[Items] = 1; /* choose items 10-15 */
run uniform_cross(c1, c2, p1, p2);
print (p1`)[L="Parent1" c=(1:N)], (p2`)[L="Parent2" c=(1:N)],
      (c1`)[L="Child1" c=(1:N)], (c2`)[L="Child2" c=(1:N)];

I augmented the output to show how the child chromosomes are created from their parents. For this run, the selected locations are 1, 8, 10, 12, and 14. The first child gets all values from the first parent except for the values in these five positions, which are from the second parent. The second child is formed similarly.

When the parent chromosomes resemble each other, the children will resemble the parents. However, if the parent chromosomes are very different, the children might not look like either parent.

Notice that you must choose HOW the crossover operator works, and you must choose a hyperparameter that determines how to split the parent chromosomes. In more sophisticated crossover operations, there might be additional hyperparameters, such as the probability that a subchromosome from a parent gets reversed in the child. There are many heuristic choices to make, and the best choice is not knowable.

In the SAS/IML language, there are many built-in crossover operations that you can use. They are discussed in the documentation for the GASETCRO subroutine.

Summary

Genetic algorithms can solve optimization problems that are intractable for traditional mathematical optimization algorithms. But the power comes at a cost. The user must make many heuristic choices about how the GA should work. The user must choose hyperparameters that control the probability that certain events happen during mutation and crossover operations. The algorithm uses random numbers to generate new potential solutions from previous candidates. This article used the SAS/IML language to discuss some of the choices that are required to implement these operations.

A subsequent article discusses how to implement a genetic algorithm in SAS.

The post Crossover and mutation: An introduction to two operations in genetic algorithms appeared first on The DO Loop.

10月 142021
 

More than four decades ago, SAS developed a breakthrough system that helped farmers and researchers better understand and analyze crop yields and livestock production. It was pioneering work that laid a foundation for the emerging field of advanced analytics. While SAS has continued to grow and serve many other industries, [...]

Growing innovation in analytics from the roots up was published on SAS Voices by John Gottula

10月 142021
 

Trimming strings left and right

I am pretty sure you have never heard of the TRIMS function, and I would be genuinely surprised if you told me otherwise. This is because this function does not exist (at least at the time of this writing).

But don’t worry, the difference between "nonexistence" and "existence" is only a matter of time, and from now it is less than a blog away. Let me explain. Recently, I published two complementary blog posts:

[1] Removing leading characters from SAS strings

[2] Removing trailing characters from SAS strings

While working on these pieces and researching “prior art” I stumbled upon a multipurpose function in the SAS FedSQL Language that alone does either one or both of these things – remove leading or/and trailing characters from SAS strings.

FedSQL Language and Proc FedSQL

The FedSQL language is the SAS proprietary implementation of the ANSI SQL:1999 core standard. Expectedly, the FedSQL language is implemented in SAS by means of the FedSQL procedure (PROC FEDSQL). This procedure enables you to submit FedSQL language statements from a Base SAS session, and it is supported in both SAS 9.4 and SAS Viya.

Using the FEDSQL procedure, you can submit FedSQL language statements to SAS and third-party data sources that are accessed with SAS and SAS/ACCESS library engines. Or, if you have SAS Cloud Analytic Services (CAS) configured, you can submit FedSQL language statements to the CAS server.

FedSQL TRIM function

FedSQL language has its own vast FedSQL Functions library with hundreds of functions many of which replicate SAS 9.4 Functions. Many, but not all. Deep inside this FedSQL functions library, there is a unique treasure modestly called TRIM Function which is quite different from the BASE SAS Language TRIM() function.

While SAS 9.4 BASE TRIM() function capabilities are quite limited - it removes just trailing blanks from a character string, the FedSQL TRIM() function is way much more powerful. This triple-action function can remove not just trailing blanks, but also leading blanks, as well as both, leading and trailing blanks. On top of it, it can remove not just blanks, but any characters (although one character at a time). See for yourself, this function has the following pretty self-explanatory syntax:

TRIM( [BOTH | LEADING | TRAILING] [trim-character] FROM column)

Here trim-character specifies one character (in single quotations marks) to remove from column. If trim-character is not specified, the function removes blanks.

While being called a function, it does not look like a regular SAS function where arguments are separated by commas.  It looks more like an SQL statement (which it understandably is – it is part of the FedSQL language). However, this function is available only in PROC FEDSQL; it’s not available in SAS DATA steps or other PROC steps. Still, it gives us pretty good idea of what such a universal function may look like.

User-defined function TRIMS to remove leading or/and trailing characters in SAS strings

Let’s build such a function by means of the PROC FCMP for the outside the FedSQL usage (it is worth noticing that the FCMP procedure is not supported for FedSQL). To avoid confusion with the existing TRIM function we will call our new function TRIMS (with an ‘S’ at the end) which suits our purpose quite well denoting its plural purpose. First, we define what we are going to create.

User-defined TRIMS function

TRIMS Function

Removes leading characters, trailing characters, or both from a character string.

Syntax

TRIMS(function-modifier, string, trim-list, trim-list-modifier)

Required Arguments

  • function-modifier is a case-insensitive character constant, variable, or expression that specifies one of three possible operations:
    'L' or 'l' – removes leading characters.
    'T' or 't' – removes trailing characters.
    'B' or 'b' – removes both, leading and trailing characters.
  • string is a case-sensitive character constant, variable, or expression that specifies the character string to be trimmed.
  • trim-list is a case-sensitive character constant, variable, or expression that specifies character(s) to remove from the string.
  • trim-list-modifier is a case-insensitive character constant variable, or expression that supplements the trim-list.
    The valid values are those modifiers of the FINDC function that “add” groups of characters (e.g. 'a' or 'A', 'c' or 'C', 'd' or 'D', etc.) to the trim-list.

The following user-defined function implementation is based on the coding techniques described in the two previous posts, [1] and [2] that I mentioned above. Here goes.

 
libname funclib 'c:\projects\functions';
 
/* delete previous function definition during debugging */
options cmplib=funclib.userfuncs;
proc fcmp outlib=funclib.userfuncs.package1;
   deletefunc trims;
run;
 
/* new function defintion */
proc fcmp outlib=funclib.userfuncs.package1;
   function trims(f $, str $, clist $, mod $) $32767;
      from = 1;
      last = length(str);
      if upcase(f) in ('L', 'B') then from = findc(str, clist, 'K'||mod);
      if from=0 then return('');
      if upcase(f) in ('T', 'B') then last = findc(str, clist, 'K'||mod, -last); 
      if last=0 then return('');
      return(substr(str, from, last-from+1));      
   endfunc; 
run;

Code highlights

  • In the function definition, we first assign initial values of the target substring positions as from=1 and last=length(str).
  • Then for Leading or Both character removal, we calculate an adjusted value of from as a position of the first character in str that is not listed in clist and not defined by the mod
  • If from=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • Then for Trailing or Both character removal, we calculate an adjusted value of last as a position of the last character in str that is not listed in clist and not defined by the mod
  • If last=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • And finally, we return a substring of str starting at the from position and ending at the last position, that is with the length of last-from+1.

TRIMS function usage

Let’s define SAS data set SOURCE as follows:

data SOURCE;
   input X $ 1-30;
   datalines;
*00It's done*2*1**-
*--*1****9*55
94*Clean record-*00
;

In the following DATA step, we will create three new variables with removed leading (variable XL), trailing (variable XT) and both - leading and trailing (variable XB) characters '*' and '-' as well as any digits:

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length XB XL XT $30;
   XB = trims('b', X, '*-', 'd');
   XL = trims('L', X, '*-', 'd');
   XT = trims('t', X, '*-', 'd');
run;

In this code we use the TRIM function three times, each time with a different first argument to illustrate how this affects the outcome.

Arguments usage highlights

  • The first argument of the TRIMS function specifies whether we remove characters from both leading and trailing positions ('b'), from leading positions only ('L'), or from trailing positions only ('t'). This argument is case-insensitive. (I prefer using capital 'L' for clarity since lowercase 'l' looks like digit '1').
  • The second argument specifies the name of the variable (X) that we are going to remove characters from (variable X is coming from the dataset SOURCE).
  • The third argument '*-' specifies which character (or characters) to remove. In our example we are removing '*' and '-'. If you do not need to explicitly specify any character here, you still must supply a null value ('') since it is a required argument. In this case, the fourth argument (trim-list-modifier) will determine the set of characters to be removed.
  • And finally, the fourth argument (case-insensitive) of the TRIMS function specifies the FINDC function modifier(s) to remove certain characters in bulk (in our example 'd' will remove all digits). If such modifier is not needed, you still must supply a null value ('') since all four arguments of the TRIMS function are positional and required.

Here is the output data table TARGRET showing the original string X and the resulting strings XB (Both leading and trailing characters removed), XL (Leading characters removed) and XT (Trailing characters removed) side by side:

Result of leading and trailing characters trimming

Conclusion

The new TRIMS function presented in this blog post goes ways further the ubiquitous LEFT and TRIM functions that remove the leading (LEFT) or trailing (TRIM) blanks. The TRIMS function handles ANY characters, not just blanks. It also expands the character deletion functionality of the powerful  FedSQL TRIM function beyond just removing any single leading and/or trailing character. The TRIMS function single-handedly removes any number of explicitly specified characters from either leading, trailing or both (leading and trailing) positions. Plus, it removes in bulk many implicitly specified characters. For example 'd' modifier removes all digits, 'du' modifier removes all digits ('d') and all uppercase letters ('u'), 'dup' modifier removes all digits ('d'), all uppercase letters ('u') and all punctuation marks ('p'); and so on as described by the FINDC function modifiers. The order in which modifier characters are listed does not matter.

Additional resources

Questions? Thoughts? Comments?

Do you find this post useful? Please share your thoughts with us below.

Introducing TRIMS function to remove any leading and/or trailing characters from SAS strings was published on SAS Users.