Could data governance policies for analytics be the foundation for a model governance program?
When was the last time you or your colleagues wanted access to data and tools to produce reports and dashboards for a business need? Probably within the last hour. Self-service BI applications – gaining popularity as we speak – make gaining insights and decision making faster. But they've also generated a greater need for governance.
Part of governance is understanding the data lifecycle or data lineage. For example, a co-worker performed some modifications to a dataset and used it to produce a report that you would like to use to help solve a business need. How can you be sure that the information in this report is accurate? How did the producer of the report calculate certain measures? From what original data set was the report based?
SAS provides many tools to help govern platforms and solutions. Let’s look at one of those tools to understand the data lifecycle: SAS Lineage Viewer.
Here we have a report created to explore and visualize telecommunications data using SAS Visual Analytics. The report shows our variable of interest, cross-sell and up-sell flag, and its relationship to other variables. This report will be used to target customers for cross-sell or up-sell.
This report is based on an Analytical Base Table (ABT) that was created by joining two data sets:
- Usage information from a subset of customers who have contacted customer care centers.
- Cleansed demographics data.
The name of the joined dataset the report is based on is LG_FINAL_ABT.
To make sure we understand the data behind this report we’ll explore it using a lineage viewer (you will need to be assigned to the "Data Management Business User” or “Data Management: Lineage” group, which an administrator can help you with). From the applications menu, select Explore Lineage.
We’ll click on Search for Subjects and search for the report we were just reviewing: Telecommunications.
I’ll enter Telecommunications in the search field then select the Telecommunications report.
The first thing I see is the LG_Final_ABT CAS table my report is dependent on.
If I click on the + sign on the top right corner of the data, LG_Final_ABT, I can see all the other relationships to that CAS table. There is a Model Studio project, two Visual Analytics Reports (including the report we looked at), and a data view that are all dependent on the LG_FINAL_ABT CAS Table. This diagram also shows us that the LG_FINAL_ABT CAS table is dependent on the Public CAS library. We also see that the LG_FINAL_ABT CAS table was loaded into CAS from the LG_FINAL_ABT.sashdat file.
Let’s explore the LG_FINAL_ABT.sashdat file to see its lineage. Clicking on the + expands the view. In the following diagram, I expanded all the remaining items to see the full data lifecycle.
This image shows us the whole data life cycle. From LG_FINAL_ABT.sashadat we see that it is dependent on the Create Final LG ABT data preparation plan. That plan is dependent on two CAS tables; LG_CUSTOMER and LG_ORIG_ABT. The data lineage viewer shows us that the LG_CUSTOMER table was loaded in from a csv file (lg_customer.csv) and the LG_ORIG_ABT CAS file was loaded in from a sas data set (lg_orig_abt.sas7dbat).
To dive deeper into the mashups and data manipulations that took place to produce LG_FINAL_ABT.sashdat, we can open the data preparation plan. To do this I’ll right click on Create Final LG ABT and select Actions then Prepare Data.
Here is the data preparation plan. At the top you can see that the creator of this data set performed five steps – Gender Analysis, Standardize, Remove, Rename and Join.
To get details into each of these steps, click on the titles at the top. Clicking on Gender Analysis, I see that a gender analysis was performed based on the customer_name field and the results were added to the data set in a variable named customer_name_GND.
Clicking on the Standardize title, I see that there were two standardization tasks performed on the original data set. One for customer state and the other for customer phone number. I can also see that the results were placed in new fields (customer_state_STND and customer_primary_phone_STND).
Clicking on the Remove title, I see that three variables were dropped from the final dataset. These variables were the original ones that the user had “fixed” in the previous steps: customer_gender, customer_state, and customer_primary_phone.
Clicking on the Rename title, I see that the new variables have been renamed.
The last step in the process is a join. Clicking on the Join title I see that LG_CUSTOMER was joined with LG_ORIG_ABT based on an inner join on Customer_ID.
We have just walked through the data lineage or data lifecycle for the dataset LG_FINAL_ABT, using SAS tools. I now understand how the data in the report we were looking at was generated. I am confident that the information that I gain from the report will be accurate.
Since sharing information and data among co-workers has become so common, it's now more crucial than ever to think about the data lifecycle. When you gain access to a report that you did not create it is always a good idea to check the underlying data to ensure that you understand the data and that any business insights gained are accurate. Also, if you are sharing data with others and you want to make modifications to it, you should always check the lineage to ensure that you won’t be undermining someone else’s work with changes you make. Thanks to SAS Visual Analytics, all the necessary tools one needs to review data lineage are all available within one interface.
Keep track of where data originated with data lineage in SAS was published on SAS Users.
Much of my recent work has been along the theme of modernization. Analytics is not new for many of our customers, but standing still in this market is akin to falling behind. In order to continue to innovative and remain competitive, organizations need to be prepared to embrace new technologies […]
As a Data Management expert, I am increasingly being called upon to talk to risk and compliance teams about their specific and unique data management challenges. It’s no secret that high quality data has always been critical to effective risk management and SAS’ market leading Data Management capabilities have long been an integrated component of our comprehensive Risk Management product portfolio. Having said that, the amount of interest, project funding and inquiries around data management for risk have reached new heights in the last twelve months and are driving a lot of our conversation with customers.
It seems that not only are organisations getting serious about data management, governments and regulators are also getting into the act in terms of enforcing good data management practices in order to promote stability of the global financial system and to avoid future crisis.
As a customer of these financial institutions, I am happy knowing that these regulations will make these organisations more robust and stronger in the event of future crisis by instilling strong governance and best practices around how data is used and managed.
On the other hand, as a technology and solution provider to these financial institutions, I can sympathise with their pain and trepidation as they prepare and modernise their infrastructure in order to support their day to day operations and at the same time be compliant to these new regulations.
Globally, regulatory frameworks such as BCBS 239 is putting the focus and attention squarely on how quality data needs to be managed and used in support of key risk aggregation and reporting.
Locally in Australia, APRA's CPG-235 in which the regulator has provided principles based guidance has outlined the types of roles, internal processes and data architectures needed in order to have a robust data risk management environment and to manage data risk effectively.
Now I must say as a long time data management professional, this latest development is extremely exciting to me and long overdue. Speaking to some of our customers in the risk and compliance departments, the same enthusiasm is definitely not shared by those charged with implementing these new processes and capabilities.
Whilst the overall level of effort involved in terms of process, people and technology cannot be underestimated in these compliance related projects, there are things that organisations can do to accelerate their effort in order to get ahead of the regulators. One piece of good news is that a large portion of the compliance related data management requirements map well with traditional data governance capabilities. Most traditional data governance projects have focused around the following key deliverables:
• Monitoring of key data quality dimensions
• Data lineage reporting and auditing
These are also the very items that the regulators are asking organisations to deliver today. SAS’ mature and proven data governance capabilities have been helping organisation with data governance projects and initiatives over the years and are now helping financial institutions tackle risk and compliance related data management requirements quickly and cost effectively.
Incidentally, our strong data governance capabilities along with our market leading data quality capabilities were cited as the main reasons SAS was selected as a category leader in Chartis Research’s first Data Management and Business Intelligence for Risk report
The combination of our risk expertise and proven data management capabilities means we are in a prime position to help our customers with these emerging data management challenges. Check out the following white papers to get a better understanding of how SAS can help you on this journey.
For the many years that I have been involved in the area of enterprise information management, I have seen organisations struggle with the issue of data quality over and over again. I have seen the IT departments struggling with the delivery of so called “Data Quality” projects, and I have also seen businesses struggling and complaining about not being able to get access to “Quality Data”.
Seeing that Data Quality technologies have matured over the years and SI’s have become reasonably good at delivering data quality projects, what exactly is the problem?
Among many different factors, I believe the two main reasons that organisations are still struggling with trusted data today are:
“Data Quality technology is not the only component needed to build and deliver trusted data at an enterprise level.
In order to gain trusted data, Business needs to be more involved in the Process along with IT”
That’s where Data Governance comes to the rescue. What data governance provides organisations is a more holistic view and framework in how they manage, control and leverage their data assets so their value can be maximised. It is the missing layer that links the necessary underlying data quality technology to the ultimate goal of trusted data. Specifically the layer that data governance inserts includes the people and process aspects that have been missing in the IT driven, pure data quality projects of the past.
What organisations have come to realise is that Trusted data depends on having a robust data governance framework and that a robust data governance framework will need a flexible, proven set of data quality tools to enforce the processes and rules. You can not have one without the other as they are intrinsically linked to each other.
There is no question that the detail of such undertakings and initiatives can be complex and extensive. If we just focused on the people aspects, things that organisations needs to come to grip now with include:
- The right level of executive/board level support
- The right organisational structure to support the initiatives
- The identification and assignments of data stewards
As a starting point for anyone in charge of delivering any data governance initiatives, the people element is perhaps the most critical and important one to even get the projects off ground. Here are a couple of whitepapers that goes into more detail to help you get started.
- Enterprise Data Governance: The Human Element
- Advancing the Data Agenda: Roles and Responsibilities for Middle Managers
I believe that the shift from data quality to data governance is a positive one. It has elevated the discussion to the executive level and is allowing organisations to think about important elements that were missing in previous discussions or projects.
With the right foundational components and the involvement of business through the appropriate process, I believe organisations will be one step closer to delivering trusted data throughout the enterprise.