The SAS Quality Knowledge Base (QKB) is a collection of files which store data and logic that define data cleansing operations such as parsing, standardization, and generating match codes to facilitate fuzzy matching. Various SAS software products reference the QKB when performing data quality operations on your data. One of these products is SAS Event Stream Processing (ESP). SAS ESP enables programmers to build applications that quickly process and analyze streaming events. In this blog, we will look at combining the power of these two products – SAS ESP and the SAS QKB.
SAS Event Stream Processing (ESP) Studio can call definitions from the SAS Quality Knowledge Base (QKB) in its Compute window. The Compute window enables the transformation of input events into output events through computed manipulations of the input event stream fields. One of the computed manipulations that can be used is calling the QKB definitions by using the BlueFusion Expression Engine Language function.
Before QKB definitions can be used in ESP projects the QKB must be installed on the SAS ESP server. Also, two environment variables must be set: DFESP_QKB and DFESP_QKB_LIC. The environment variable DFESP_QKB should be set to the path where the QKB data was installed. The environment variable DFESP_QKB_LIC should be set to the path and filename that contains the license(s) for the QKB locale(s).
In this post, I will explore the example of calling the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale in the ESP Compute window. The Source window is reading in events that contain US State data that may or may not be standardized in the 2-character US State abbreviation.
As part of the event stream analysis I want to perform, I need the US_State values to be in a standard format. To do this I will utilize the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale.
First, I need to initialize the call to the BlueFusion Expression Engine Language function and load the ENUSA (English – United States) locale. Note: The license file that the DFESP_QKB_LIC environment variable points to must contain a license for this locale.
Next, I need to call the QKB definition and return its result. In this case, I am calling the BlueFusion standardize function. This function expects the following inputs: Definition Name, Input Field to Standardize, Output Field for Standardized Value. In this case the Definition Name is State/Province (Abbreviation), the Input Field to Standardize is US_State, and the Output Field for the Standardized Value is result. Note: The field result was declared in the Initialize expression pictured above. This result value is returned in the output field named US_State_STND.
I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.
Here is the XML code for the SAS ESP project reviewed in this blog:
Now that the US_State values have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.