If you have worked with the different types of score code generated by the high-performance modeling nodes in SAS® Enterprise Miner 14.1, you have probably come across the Analytic Store (or ASTORE) file type for scoring. The ASTOREfile type works very well for scoring complex machine learning models like random forests, gradient boosting, support vector machines and others. In this article, we will focus on ASTORE files generated by SAS® Viya® Visual Data Mining and Machine Learning (VDMML) procedures. An introduction to analytic stores on SAS Viya can be found here.
In this post, we will:
- Generate an ASTORE file for a PROC ASTORE in SAS Visual Data Mining and Machine Learning.
Generate an ASTORE file for a gradient boosting model
Our example dataset is a distributed in-memory CAS table that contains information about applicants who were granted credit for a certain home equity loan. The categorical binary-valued target variable ‘BAD’ identifies if a client either defaulted or repaid their loan. The remainder of the variables indicating the candidate’s credit history, debt-to-income ratio, occupation, etc., are used as predictors for the model. In the code below, we are training a gradient boosting model on a randomly sampled 70% of the data and validating against 30% of the data. The statement SAVESTATE creates an analytic store file (ASTORE) for the model and saves it as a binary file named “astore_gb.”
proc gradboost data=PUBLIC.HMEQ; partition fraction(validate=0.3); target BAD / level=nominal; input LOAN MORTDUE DEBTINC VALUE YOJ DEROG DELINQ CLAGE NINQ CLNO / level=interval; input REASON JOB / level=nominal; score out=public.hmeq_scored copyvars=(_all_); savestate rstore=public.astore_gb; id _all_; run;
Shown below are a few observations from the scored dataset hmeq_scored where YOJ (years at present job) is greater than 10 years.
Override the scoring decision using PROC ASTORE
In this segment, we will use PROC ASTORE to override the scoring decision from the gradient boosting model. To that end, we will first make use of the DESCRIBE statement in PROC ASTORE to produce basic DS2 scoring code using the EPCODE option. We will then edit the score code in DS2 language syntax to override the scoring decision produced from the gradient boosting model.
proc astore; describe rstore=public.astore_gb epcode="/viyafiles/jukhar/gb_epcode.sas"; run;
A snapshot of the output from the above code statements are shown below. The analytic store is assigned to a unique string identifier. We also get information about the analytic engine that produced the store (gradient boosting, in this case) and the time when the store was created. In addition, though not shown in the snapshot below, we get a list of the input and output variables used.
Let’s take a look at the DS2 score code (“gb_epcode.sas”) produced by the EPCODE option in the DESCRIBE statement within PROC ASTORE.
data sasep.out; dcl package score sc(); dcl double "LOAN"; dcl double "MORTDUE"; dcl double "DEBTINC"; dcl double "VALUE"; dcl double "YOJ"; dcl double "DEROG"; dcl double "DELINQ"; dcl double "CLAGE"; dcl double "NINQ"; dcl double "CLNO"; dcl nchar(7) "REASON"; dcl nchar(7) "JOB"; dcl double "BAD"; dcl double "P_BAD1" having label n'Predicted: BAD=1'; dcl double "P_BAD0" having label n'Predicted: BAD=0'; dcl nchar(32) "I_BAD" having label n'Into: BAD'; dcl nchar(4) "_WARN_" having label n'Warnings'; Keep "P_BAD1" "P_BAD0" "I_BAD" "_WARN_" "BAD" "LOAN" "MORTDUE" "VALUE" "REASON" "JOB" "YOJ" "DEROG" "DELINQ" "CLAGE" "NINQ" "CLNO" "DEBTINC" ; varlist allvars[_all_]; method init(); sc.setvars(allvars); sc.setKey(n'F8E7B0B4B71C8F39D679ECDCC70F6C3533C21BD5'); end; method preScoreRecord(); end; method postScoreRecord(); end; method term(); end; method run(); set sasep.in; preScoreRecord(); sc.scoreRecord(); postScoreRecord(); end; enddata;
The sc.setKey in the method init () method block contains a string identifier for the analytic store; this is the same ASTORE identifier that was previously outputted as part of PROC ASTORE. In order to override the scoring decision created from the original gradient boosting model, we will edit the gb_epcode.sas file (shown above) by inserting new statements in the postScoreRecord method block; the edited file must follow DS2 language syntax. For more information about the DS2 language, see
method postScoreRecord(); if YOJ>10 then do; I_BAD_NEW='0'; end; else do; I_BAD_NEW=I_BAD; end; end;