Jim Harris asks: Do you retain and maintain data, or do you have a data retention strategy?
How effective is your organization at leveraging data and analytics to power your business models?
This question is surprising hard for many organizations to answer. Most organizations lack a roadmap against which they can measure their effectiveness for using data and analytics to optimize key business processes, uncover new business opportunities or deliver a differentiated customer experience. They do not understand what’s possible with respect to integrating big data and data science into the organization’s business model (see Figure 1).
My SAS Global Forum 2018 presentation on Tuesday April 10, 2018 will discuss the transformative potential of big data and advanced analytics, and will leverage the Big Data Business Model Maturity Index as a guide for helping organizations understand where and how they can leverage data and analytics to power their business models.
Digital Twins, Analytics Profiles and the Power of One
We all understand that the volume and variety of data are increasing exponentially. Your customers are leaving their digital fingerprints across the Internet via their website, social media, and mobile devices usage. The Internet of Things will unleash an estimated 44 Zettabytes of data across 7 billion connected people by 2020.
However, big data isn’t really about big; it’s about small. It’s about understanding your customer and product behaviors at the level of the individual. Big Data is about building detailed behavioral or analytic profiles for each individual (see Figure 2).
If you want to better serve your customers, you need to understand their tendencies, behaviors, inclinations, preferences, interests and passions at the level of each individual customer.
Customers’ expectations of their vendors are changing due to their personal experiences. From recommending products, services, movies, music, routes and even spouses, customers are expecting their vendors to understand they well enough that these vendors can provide a hyper-personalized customer experience.
Demystifying Data Science (AI | ML | DL)
Too many organizations are spending too much time confusing too many executives on the capabilities of data science. The concept of data science is simple; data science is about identifying the variables and metrics that might be better predictors of business and operational performance (see Figure 3).
Whether using basic statistics, predictive analytics, data mining, machine learning, or deep learning, almost all of data science benefits are achieved from the simple formula of: Input (A) → Response (B).
By collaborating closely with the business subject matter experts to choosing Input (A), those variables and metrics that might be better predictor of performance, the data science team can achieve more accurate, more granular, lower latency Response (B). And the creative creation and selection of Input (A) creatively has already revolutionized many industries, and is poised to revolutionize more.
Data Monetization and the Economic Value of Data
Data is an unusual asset – it doesn’t deplete, it doesn’t wear out and it can be used across an infinite number of use cases at near zero marginal cost. Organizations have no other assets with those unique characteristics. And while traditional accounting methods of valuing assets works well with physical assets, account methods fall horribly – dangerously – short in properly determining the economic value of data.
Instead of using traditional accounting techniques to determine the value of the organization’s data, apply economic and data science concepts to determine the economic value of the data based upon it’s ability to optimize key business and operational processes, reduce compliance and security risks, uncover new revenue opportunities and create a more compelling, differentiated customer experience (see Figure 4).
The data lake, which can house both data and analytic models, is transformed from a simple data repository into a “collaborative value creation platform” that facilities the capture, refinement and sharing of the data and analytic digital assets across the enterprise.
Creating the Intelligent Enterprise
When you add up all of these concepts and advancements – Big Data, Analytic Profiles, Data Science and the Economic Value of Data – organizations are poised for digital transformation (see Figure 5).
And what is Digital Transformation?
Digital Transformation is application of digital capabilities to processes, products, and assets to improve efficiency, enhance customer value, manage risk, and uncover new monetization opportunities.
Looking forward to seeing you at my SAS Global Forum 2018 session and helping your organizations on its digital transformation!
In the hype and excitement surrounding artificial intelligence and big data, most of us miss out on critical aspects related to collection, processing, handling and analyzing data. It's important for data science practitioners to understand these critical aspects and add a human touch to big data. What are these aspects? [...]
Why is it important to add a human touch to big data? was published on SAS Voices by Jay Paulson
PROC FREQ is one of the most popular procedures in the SAS language. It is mostly used to describe frequency distribution of a variable or combination of variables in contingency tables. However, PROC FREQ has much more functionality than that. For an overview of all that it can do, see an introduction of the SAS documentation. SAS Viya does not have PROC FREQ, but that doesn’t mean you can’t take advantage of this procedure when working with BIG DATA. SAS 9.4m5 integration with SAS Viya allows you to summarize the data appropriately in CAS, and then pass the resulting summaries to PROC FREQ to get any of the statistics that it is designed to do, from your favorite SAS 9.4 coding client. In this article, you will see how easy it is to work with PROC FREQ in this manner.
These steps are necessary to accomplish this:
- Define the CAS environment you will be using from a SAS 9.4m5 interface.
- Identify variables and/or combination of variables that define the dimensionality of contingency tables.
- Load the table to memory that will need to be summarized.
- Summarize the data with a CAS enable procedure, use PROC FEDSQL for high cardinality cases.
- Write the appropriate PROC FREQ syntax using the WEIGHT statement to input cell count data.
Step 1: Define the CAS environment
Before you start writing any code, make sure that there is an _authinfo file in the appropriate user directory in your system (for example, c:\users\<userid>$$ with the following information:
host <cas server name> port <number> user "<cas user id>" password "<cas user password>"
This information can be found by running the following statement in the SAS Viya environment from SAS Studio:
cas; caslib _all_ assign;
Then, in your SAS 9.4m5 interface, run the following program to define the CAS environment that you will be working on:
options cashost=" " casport=; cas mycas user=; libname mycas cas; /** Set the in memory shared library if you will be using any tables already promoted in CAS **/ libname public cas caslib=public;
Figure 1 shows the log and libraries after connecting to the CAS environment that will be used to deal with Big Data summarizations.
Step 2: Identify variables
The variables here are those which will be use in the TABLE option in PROC FREQ. Also, any numeric variable that is not part of TABLE statement can be used to determine the input cell count.
Step 3: Load tale to memory
There are two options to do this. The first one is to load the table to the PUBLIC library in the SAS Viya environment directly. The second option is to load it from your SAS 9.4m5 environment into CAS. Loading data into CAS can be as easy as writing a DATA step or using other more efficient methods depending on where the data resides and its size. This is out of scope of this article.
Step 4: Summarize the data with a CAS enable procedure
Summarizing data for many cross tabulations can become very computing expensive, especially with Big Data. SAS Viya offers several ways to accomplish this (i.e. PROC MEANS, PROC MDSUMMARY, PROC FEDSQL). There are ways to optimize performance for high cardinality summarization when working with CAS. PROC FEDSQL is my personal favorite since it has shown good performance.
Step 5: Write the PROC FREQ
When writing the PROC FREQ syntax make sure to use the WEIGHT statement to instruct the algorithm to use the appropriate cell count in the contingency tables. All feature and functionality of PROC FREQ is available here so use it to its max! The DATA option can point to the CAS library where your summarized table is, so there will be some overhead for data transfer to the SAS 9 work server. But, most of the time the summarized table is small enough that the transfer may not take long. An alternative to letting PROC FREQ do the data transfer is doing a data step to bring the data from CAS to a SAS base library in SAS 9 and then running PROC FREQ with that table as the input.
Figure 2 below shows a sample program on how to take advantage of CAS from SAS 9.4m5 to produce different analyses using PROC FREQ.
Figure 3 shows the log of the summarization in CAS for a very large table in 1:05.97 minutes (over 571 million records with a dimensionality of 163,964 for all possible combinations of the values of feature, date and target). The PROC FREQ shows three different ways to use the TABLE statement to produce the desired output from the summarized table which took only 1.22 seconds in the SAS 9.4m5 environment.
/** Connect to Viya Environment **/ options cashost="xxxxxxxxxxx.xxx.xxx.com" casport=5570; cas mycas user=calara; libname mycas cas datalimit=all; /** Set the in memory shared library **/ libname public cas caslib=public datalimit=all; /** Define macro variables **/ /* CAS Session where the FEDSQL will be performed */ %let casref=mycas; /* Name of the output table of the frequency counts */ %let outsql=sql_sum; /* Variables for cross classification cell counts */ %let dimvars=date, feature, target; /* Variable for which frequencies are needed */ %let cntvar=visits; /* Source table */ %let intble=public.clicks_summary; proc FEDSQL sessref=&casref.; create table &outsql. as select &dimvars., count(&cntvar.) as freq from &intble. group by &dimvars.; quit; run; proc freq data=&casref..&outsql.; weight freq; table date; table date*target; table feature*date / expected cellchi2 norow nocol chisq noprint; output out=ChiSqData n nmiss pchi lrchi; run;
David Loshin explains how to set up a data catalog that will help you get more value from a data lake.
The post How to use a data catalog to get more value from your data lake appeared first on The Data Roundtable.
Joyce Norris-Montanari says focus on data quality and governance, privacy and security when providing data on demand.
The post How does data on demand (for different users) change a data strategy? appeared first on The Data Roundtable.