Felix Liao

9月 222020
 

Everyone knows that SAS has been helping programmers and coders build complex machine learning models and solve complex business problems for many years, but did you know that you can also now build machines learning models without a single line of code using SAS Viya?

SAS has been helping programmers and coders build complex machine learning models and solve complex business problems over many years.

Building on the vision and commitment to democratize analytics, SAS Viya offers multiple ways to support non-programmers and empowers people with no programming skills to get up and running quickly and build machine learning models. I touched on some of the ways this can be done via SAS Visual Analytics in my previous post on analytics for everyone with SAS Viya. In addition, SAS Viya also supports more advanced pipeline-based visual modeling via SAS Visual Data Mining and Machine Learning. The combination of these different tools within SAS Viya supporting a low-code/no-code approach to modeling makes SAS Viya an incredibly flexible and powerful analytics platform that can help drive analytics usage and adoption throughout an organization.

As analytics and machine learning become more pervasive, an analytics platform that supports a low-code/no-code approach can get more people involved, drive ongoing innovations, and ultimately accelerate digital transformation throughout an organization.

Speed

I have met my fair share of coding ninjas who blew me away with their ability to build models using keyboards with lightning speed. But when it comes to being able to quickly get an idea into a model and generate all the assessment statistics and charts, there is nothing quite like a visual approach to building machine learning models.

In SAS Viya, you can build a decision tree model literally just by dragging and dropping the relevant variables onto the canvas as shown in the animated screen flow below.

Building a machine learning model via drag and drop

In this case, we were able to quickly build a decision tree model that predicts child mortality rates around the world. Not only do we get the decision tree in all its graphics glory (on the left-hand side of the image), we also get the overall model fit measure (Average Standard Error in this case), a variable importance chart, as well as a lift chart all without having to enter a single line of code in under 5 seconds!

You also get a bunch of detailed statistical outputs, including a detailed node statistics table without having to do anything extra. This is useful for when you need to review the distribution and characteristics of specific nodes when using the decision tree.

Detailed node statistics table

 

What’s more, you can leverage the same drag-and-drop paradigm to quickly tune the model. In our case, you can do simple modifications like adding a new variable by simply dragging a new data item onto the canvas or more complex techniques like manually splitting or pruning a node just by clicking and selecting a node on the canvas. The whole model and visualization refreshes instantly as you make changes, and you get instant feedback on the outputs of your tuning actions, which can help drive rapid iteration and idea testing.

Governance and collaboration

A graphical and components-based approach to modeling also has the added benefits of providing a stronger level of governance and fostering collaboration. Building machine learning model is often a team sport, and the ability to share and reuse models easily can dramatically reduce the cost and effort involved in building and maintaining models.

SAS Visual Data Mining and Machine Learning enables users to build complex, enterprise-grade pipeline models that support sophisticated variable selection, feature engineering techniques, as well as model comparison processes all within a single, easy-to-understand, pipeline-based design framework.

Pipeline modeling using SAS VDMML

The graphical, pipeline-based modeling framework within SAS Visual Data Mining and Machine Learning leverages common components, supports self-documentation, and allows users to leverage a template-based approach to building and sharing machine learning models quickly.

More importantly, as a new user or team member who needs to review, tune or reuse someone else’s model, it is much easier and quicker to understand the design and intent of the various components of a pipeline model and make the needed changes.

It is much easier and quicker to understand the design and intent of the various components of a pipeline model.

Communication and storytelling

Finally, and perhaps most importantly, a graphical, low-code/no-code approach to building machine learning models makes it much easier to communicate both the intent and potential impact of the model. Figures and numbers represent facts, but narratives and stories convey emotion and build connections. The visual modeling approaches supported by SAS Viya enable you to tell compelling stories, share powerful ideas, and inspire valuable actions.

SAS Viya enables you to make changes and apply filters on the fly within its various visual modeling environments. With the model training process and model outputs all represented visually, it makes it extremely easy to discuss business scenarios, test hypotheses, and test modeling strategies and approaches, even with people without a deep machine learning background.

There is no question that a programmatic approach to building machine learning models offers the ultimate power and flexibility and enables data scientist to build the most complex and advanced machine learning models. But when it comes to speed, governance, and communications, a graphical, low-code/no-code approach to building machine learning definitely has a lot to offer.

To learn more about a low-code/no-code approach to building machine learning models using SAS Viya, check out my book Smart Data Discovery Using SAS® Viya®.

The value of a low-code/no-code approach to building machine learning models was published on SAS Users.

9月 222020
 

Everyone knows that SAS has been helping programmers and coders build complex machine learning models and solve complex business problems for many years, but did you know that you can also now build machines learning models without a single line of code using SAS Viya?

SAS has been helping programmers and coders build complex machine learning models and solve complex business problems over many years.

Building on the vision and commitment to democratize analytics, SAS Viya offers multiple ways to support non-programmers and empowers people with no programming skills to get up and running quickly and build machine learning models. I touched on some of the ways this can be done via SAS Visual Analytics in my previous post on analytics for everyone with SAS Viya. In addition, SAS Viya also supports more advanced pipeline-based visual modeling via SAS Visual Data Mining and Machine Learning. The combination of these different tools within SAS Viya supporting a low-code/no-code approach to modeling makes SAS Viya an incredibly flexible and powerful analytics platform that can help drive analytics usage and adoption throughout an organization.

As analytics and machine learning become more pervasive, an analytics platform that supports a low-code/no-code approach can get more people involved, drive ongoing innovations, and ultimately accelerate digital transformation throughout an organization.

Speed

I have met my fair share of coding ninjas who blew me away with their ability to build models using keyboards with lightning speed. But when it comes to being able to quickly get an idea into a model and generate all the assessment statistics and charts, there is nothing quite like a visual approach to building machine learning models.

In SAS Viya, you can build a decision tree model literally just by dragging and dropping the relevant variables onto the canvas as shown in the animated screen flow below.

Building a machine learning model via drag and drop

In this case, we were able to quickly build a decision tree model that predicts child mortality rates around the world. Not only do we get the decision tree in all its graphics glory (on the left-hand side of the image), we also get the overall model fit measure (Average Standard Error in this case), a variable importance chart, as well as a lift chart all without having to enter a single line of code in under 5 seconds!

You also get a bunch of detailed statistical outputs, including a detailed node statistics table without having to do anything extra. This is useful for when you need to review the distribution and characteristics of specific nodes when using the decision tree.

Detailed node statistics table

 

What’s more, you can leverage the same drag-and-drop paradigm to quickly tune the model. In our case, you can do simple modifications like adding a new variable by simply dragging a new data item onto the canvas or more complex techniques like manually splitting or pruning a node just by clicking and selecting a node on the canvas. The whole model and visualization refreshes instantly as you make changes, and you get instant feedback on the outputs of your tuning actions, which can help drive rapid iteration and idea testing.

Governance and collaboration

A graphical and components-based approach to modeling also has the added benefits of providing a stronger level of governance and fostering collaboration. Building machine learning model is often a team sport, and the ability to share and reuse models easily can dramatically reduce the cost and effort involved in building and maintaining models.

SAS Visual Data Mining and Machine Learning enables users to build complex, enterprise-grade pipeline models that support sophisticated variable selection, feature engineering techniques, as well as model comparison processes all within a single, easy-to-understand, pipeline-based design framework.

Pipeline modeling using SAS VDMML

The graphical, pipeline-based modeling framework within SAS Visual Data Mining and Machine Learning leverages common components, supports self-documentation, and allows users to leverage a template-based approach to building and sharing machine learning models quickly.

More importantly, as a new user or team member who needs to review, tune or reuse someone else’s model, it is much easier and quicker to understand the design and intent of the various components of a pipeline model and make the needed changes.

It is much easier and quicker to understand the design and intent of the various components of a pipeline model.

Communication and storytelling

Finally, and perhaps most importantly, a graphical, low-code/no-code approach to building machine learning models makes it much easier to communicate both the intent and potential impact of the model. Figures and numbers represent facts, but narratives and stories convey emotion and build connections. The visual modeling approaches supported by SAS Viya enable you to tell compelling stories, share powerful ideas, and inspire valuable actions.

SAS Viya enables you to make changes and apply filters on the fly within its various visual modeling environments. With the model training process and model outputs all represented visually, it makes it extremely easy to discuss business scenarios, test hypotheses, and test modeling strategies and approaches, even with people without a deep machine learning background.

There is no question that a programmatic approach to building machine learning models offers the ultimate power and flexibility and enables data scientist to build the most complex and advanced machine learning models. But when it comes to speed, governance, and communications, a graphical, low-code/no-code approach to building machine learning definitely has a lot to offer.

To learn more about a low-code/no-code approach to building machine learning models using SAS Viya, check out my book Smart Data Discovery Using SAS® Viya®.

The value of a low-code/no-code approach to building machine learning models was published on SAS Users.

8月 252020
 

Analytics is playing an increasingly strategic role in the ongoing digital transformation of organizations today. However, to succeed and scale your digital transformation efforts, it is critical to enable analytics skills at all tiers of your organization. In a recent blog post covering 4 principles of analytics you cannot ignore, SAS COO Oliver Schabenberger articulated the importance of democratizing analytics. By scaling your analytics efforts beyond traditional data science teams and involving more people with strong business domain knowledge, you can gain more valuable insights and make more significant impacts.

SAS Viya was built from the ground up to fulfill this vision of democratizing analytics. At SAS, we believe analytics should be accessible to everyone. While SAS Viya offers tremendous support and will continue to be the tool of choice for many advanced users and programmers, it is also highly accessible for business analysts and insights team who prefer a more visual approach to analytics and insights discovery.

Self-service data management

First of all, SAS Viya makes it easy for anyone to ingest and prepare data without a single line of code. The integrated data preparation components within SAS Viya support ad-hoc, agile-oriented data management tasks where you can profile, cleanse, and join data easily and rapidly.

Automatically Generated Data Profiling Report

You can execute complex joins, create custom columns, and cleanse your data via a completely drag-and-drop interface. The automation built into SAS Viya eases the often tedious task of data profiling and data cleansing via automated data type identification and transform suggestions. In an area that can be both complex and intimidating, SAS Viya makes data management tasks easy and approachable, helping you to analyze more data and uncover more insights.

Data Join Using a Visual Interface

A visual approach supporting low-code and no-code programming

Speaking of no-code, SAS Viya’s visual approach and support extend deep into data exploration and advanced modeling. Not only can you quickly build charts such as histograms and box plots using a drag and drop interface, but you can also build complex machine learning models using algorithms such as decision trees and logistic regression on the same visual canvas.

Building a Decision Tree Model Using SAS Viya

By putting the appropriate guard rails and providing relevant and context-rich help for the user, SAS Viya empowers users to undertake data analysis using other advanced analytics techniques such as forecasting and correlation analysis. These techniques empower users to ask more complex questions and can potentially help uncover more actionable and valuable insights.

Correlation Analysis Using the Correlation Matrix within SAS Viya

Augmented analytics

Augmented analytics is an emerging area of analytics that leverages machine learning to streamline and automate the process of doing analytics and building machine learning models. SAS Viya leverages augmented analytics throughout the platform to automate various tasks. My favorite use of augmented analytics in SAS Viya, though, is the hyperparameters autotuning feature.

In machine learning, hyperparameters are parameters that you need to set before the learning processing can begin. They are only used during the training process and contribute significantly to the model training process. It can often be challenging to set the optimal hyperparameter settings, especially if you are not an experienced modeler. This is where SAS Viya can help by making building machine learning models easier for everyone one hyperparameter at a time.

Here is an example of using the SAS Viya autotuning feature to improve my decision tree model. Using the autotuning window, all I needed to do was tell SAS Viya how long I want the autotuning process to run for. It will then work its magic and determine the best hyperparameters to use, which, in this case, include the Maximum tree level and the number of Predictor bins. In most cases, you get a better model after coming back from getting a glass of water!

Hyperparameters Autotuning in SAS Viya

Under the hood, SAS Viya uses complex optimization techniques to try to find the best hyperparameter combinations to use all without you having to understand how it manages this impressive feat. I should add that hyperparameters autotuning is supported with many other algorithms in SAS Viya, and you have even more autotuning options when using it via the programmatic interface!

By leveraging a visually oriented framework and augmented analytics capabilities, SAS Viya is making analytics easier and machine learning models more accessible for everyone within an organization. For more on how SAS Viya enables everyone to ask more complex questions and uncover more valuable insights, check out my book Smart Data Discovery Using SAS® Viya®.

Analytics for everyone with SAS Viya was published on SAS Users.

12月 062017
 

Are you struggling to kick start your organization’s analytics journey, especially when it comes to leveraging advanced analytics and machine learning techniques? If the answer is yes then you’re definitely not alone. Whilst most organisations today recognise the benefit of analytics and data science, many are still struggling to kick [...]

4 Reasons to Kick Start your Analytics Journey with SAS Visual Analytics was published on SAS Voices by Felix Liao

10月 182016
 

Hadoop may have been the buzzword for the last few years, but streaming seems to be what everyone is talking about these days. Hadoop deals primarily with big data in stationary and batch-based analytics. But modern streaming technologies are aimed at the opposite spectrum, dealing with data in motion and […]

The post To stream or not to stream? appeared first on The Data Roundtable.

2月 162015
 

text globeData Management has been the foundational building block supporting major business analytics initiatives from day one. Not only is it highly relevant, it is absolutely critical to the success of all business analytics projects.

Emerging big data platforms such as Hadoop and in-memory databases are disrupting traditional data architecture in the way organisations store and manage data. Furthermore, new techniques such as schema on-read and persistent in-memory data store are changing how organisations deliver data and drive the analytical life cycle.

This brings us to the question of how relevant data management is in the era of big data? At SAS, we believe that data management will continue to be the critical link between traditional data sources, big data platforms and powerful analytics. There is no doubt that the WHERE and HOW big data will be stored will change and evolve overtime. However that doesn’t affect the need for big data to be subject to the same quality and control requirements as traditional data sources.

Fundamentally, big data cannot be used effectively without proper data management

Data Integration

Data has always been more valuable and powerful when it is integrated and this will remain to be true in the era of big data.

It is a well known fact that whilst Hadoop is being used as a powerful data storage repository for high volume, unstructured or semi-structure information, most corporate data are still locked in traditional RDBMs or data warehouse appliances. The true value of weblog traffic or meter data stored in Hadoop can only be unleashed when they are linked and integrated with customer profile and transaction data that are stored in existing applications.  The integration of high volume, semi-structured big data with legacy transaction data will provide powerful business insights that can be game changing.

Data has always been more valuable and powerful when it is integrated and this will continue to be true in the era of big data.

Big data platforms provide an alternative source of data within an organisation’s enterprise data architecture today, and therefore must be part of an organization integration capability.

Data Quality

Just because data lives and comes from a new data source and platform doesn’t mean high levels of quality and accuracy can be assumed. In fact, Hadoop data is known to be notoriously poor in terms of its quality and structure simply because of the lack of control and ease of how data can get into a Hadoop environment.

Just like traditional data sources, before raw Hadoop data can be used, it needs to be profiled and analysed. Often issues such as non-standardised fields and missing data become glaringly obvious when analysts try to tap into Hadoop data sources. Automated data cleansing and enrichment capabilities within the big data environment are critical to make the data more relevant, valuable and most importantly, trustworthy.

As Hadoop gains momentum as a general purpose data repository, there will be increasing pressure to adopt traditional data quality processes and best pracrices.

Data Governance

It should come as no surprise that policies and practices around data governance will need to be applied to new big data sources and platforms. The requirements of storing and manage metadata, understanding lineage and implementing data stewardship do not go away simply because the data storage mechanism has changed.

Furthermore, the unique nature of Hadoop as a highly agile and flexible data repository also brings new challenges around privacy and security around how data needs to be managed, protected and shared. Data Governance will play an increasingly important role in the era of big data as the need to better align IT and business increases.

Data Governance will play an increasingly important role in the era of big data as the need to better align IT and business increases

Whilst the technology underpinning how organisations store their data is going through tremendous change, the need to integrate, govern and manage the data itself have not changed. If anything, the changes to the data landscape and the increase in types and forms of data repositories will make the tasks around data management more challenging than ever.

SAS recognises the challenge faced by our customers and has continued to investment in our extensive Data Management product portfolio by embracing big data platforms from leading vendors such as Cloudera and Hortonworks as well as supporting new data architecture and data management approaches.

As this recent NY Times article appropriately called out, a robust and automated data management platform within a big data environment is critical to empower data scientists and analyst so that they can be freed from doing “Data Janitor” work and focus on the high value activities.

tags: big data, data governance, data management, data quality, Hadoop
10月 242014
 

Hadoop2I have been on a whirlwind tour locally here in Australia visiting existing SAS customers where the focus of discussions have centered around SAS and Hadoop. I am happy to report that during these discussions, customers have been consistently surprised and excited about what we are doing around SAS on Hadoop! Three things in particular stood out and have resonated well with our wonderful SAS users community that I thought I share them here for the benefit of the broader community.

1. All SAS products are Hadoop enabled today

Whilst some of our newer products such as Visual Analytics and In-Memory Statistics for Hadoop were built from day one with Hadoop in mind, you might not be aware that in fact all of our current SAS products have been Hadoop enabled and can take advantage of Hadoop today.

Our mature and robust SAS/Access interface to Hadoop technology allows SAS users today to easily connect to Hadoop data sources using any SAS applications. A key point here is being able to do this without having to understand any of the underlying technology or write a single line of MapReduce code. Furthermore, the SAS/Access interface for Hadoop has been optimised and can push SAS procedures into Hadoop for execution, thereby allowing developers to tap into the power of Hadoop and improving the performance of basic SAS operations.

2. SAS does Analytics in Hadoop

The SAS R&D team have worked extremely hard with our Hadoop distribution partners to take full advantage of the powerful technologies within the Hadoop ecosystem. We are driving integration deep into the heart of the Hadoop ecosystem with technologies such as HDFS, Hive, MapReduce, Pig and YARN.

The SAS users I have been speaking to have been pleasantly surprised by the depth of our integration with Hadoop and excited about what it means for them as end users. Whether it’s running analytics in our high performance in-memory servers within a Hadoop cluster or pushing analytics workload deep into the Hadoop environment, SAS is giving users the power and flexibility in deciding where and how they want to run their SAS workloads.

This point was powerfully made by none other than one of the co-founder of Hortonworks in his recent blog post and I couldn’t have phrased his comment better myself!

“Integrating SAS HPA and LASR with Apache Hadoop YARN provides tremendous benefits to customers using SAS products and Hadoop. It is a great example of the tremendous openness and vision shown by SAS”

3. Organisations are benefiting from SAS on Hadoop today

With Hadoop being THE new kid on the block, you might be wondering if there are any customers that are already taking advantage of SAS and Hadoop now. One such customer is Rogers Media – They’ve been doing some pretty cool stuff with SAS and Hadoop to drive real business value and outcomes!

In a chat with Dr. Goodnight during SAS Global Forum this year, Chris Dingle from Rogers Media shared how they are using SAS and Hadoop to better understand their audience. I was fortunate enough to be there in person myself, and I must say the keynote session on Hadoop and Rogers Media was a highlight for many people there and definitely got the masses thinking what they should be doing around SAS and Hadoop. For those of you who are interested in more details, here is a recap of the presentation explaining the SAS/Hortonworks integration as well as more details on the Rogers Media case study.

We are working with a number of organisations around the world on exciting SAS on Hadoop projects so watch this space!

All in all, it’s a great time to be a SAS user and it has never been easier to take advantage of the power of Hadoop as a SAS user. I encourage you find out more, reach out to us or leave comments here as we would love to hear about how you plan to leverage the power of SAS and Hadoop!

tags: Hadoop
7月 072014
 

200426063-001As a Data Management expert, I am increasingly being called upon to talk to risk and compliance teams about their specific and unique data management challenges. It’s no secret that high quality data has always been critical to effective risk management and SAS’ market leading Data Management capabilities have long been an integrated component of our comprehensive Risk Management product portfolio. Having said that, the amount of interest, project funding and inquiries around data management for risk have reached new heights in the last twelve months and are driving a lot of our conversation with customers.

It seems that not only are organisations getting serious about data management, governments and regulators are also getting into the act in terms of enforcing good data management practices in order to promote stability of the global financial system and to avoid future crisis.

As a customer of these financial institutions, I am happy knowing that these regulations will make these organisations more robust and stronger in the event of future crisis by instilling strong governance and best practices around how data is used and managed.

On the other hand, as a technology and solution provider to these financial institutions, I can sympathise with their pain and trepidation as they prepare and modernise their infrastructure in order to support their day to day operations and at the same time be compliant to these new regulations.

Globally, regulatory frameworks such as BCBS 239 is putting the focus and attention squarely on how quality data needs to be managed and used in support of key risk aggregation and reporting.

Locally in Australia, APRA's CPG-235 in which the regulator has provided principles based guidance has outlined the types of roles, internal processes and data architectures needed in order to have a robust data risk management environment and to manage data risk effectively.

Now I must say as a long time data management professional, this latest development is extremely exciting to me and long overdue. Speaking to some of our customers in the risk and compliance departments, the same enthusiasm is definitely not shared by those charged with implementing these new processes and capabilities.

Whilst the overall level of effort involved in terms of process, people and technology cannot be underestimated in these compliance related projects, there are things that organisations can do to accelerate their effort in order to get ahead of the regulators. One piece of good news is that a large portion of the compliance related data management requirements map well with traditional data governance capabilities. Most traditional data governance projects have focused around the following key deliverables:

•      Common business definitions1397487016440

•      Monitoring of key data quality dimensions

•      Data lineage reporting and auditing

These are also the very items that the regulators are asking organisations to deliver today. SAS’ mature and proven data governance capabilities have been helping organisation with data governance projects and initiatives over the years and are now helping financial institutions tackle risk and compliance related data management requirements quickly and cost effectively.

Incidentally, our strong data governance capabilities along with our market leading data quality capabilities were cited as the main reasons SAS was selected as a category leader in Chartis Research’s first Data Management and Business Intelligence for Risk report

The combination of our risk expertise and proven data management capabilities means we are in a prime position to help our customers with these emerging data management challenges. Check out the following white papers to get a better understanding of how SAS can help you on this journey.

•      BCBS 239: Meeting Regulatory Obligations While Optimizing Cost Reductions

•      Risk and Compliance in Banking: Data Management Best Practices

•      Best Practices in Enterprise Data Governance

 

tags: compliance, governance, risk
5月 222014
 

824009.TIFNext generation business intelligence and visualisation tools such as SAS Visual Analytics, are revolutionising insight discovery by offering a truly self service platform powered by sophisticated visualisations and embedded analytics. It has never been easier to get hold of vast amounts of data, visualise that data, uncover valuable insights and make important business decisions, all in a single day’s work.

On the flip side, the speed and ease of getting access to data, and then uncovering and delivering insights via powerful charts and graphs have also exasperated the issue around data quality. It is all well and good when the data being used by analysts is clean and pristine. More often than not, when the data being visualisation is of poor quality, the output and results can be telling and dramatic, but in a bad way.

Let me give you an example from a recent customer discussion to illustrate the point (I have, of course synthesised the data here to protect the innocent!).

Our business analyst in ACME bank has been tasked with the job of analysing customer deposits to identify geographically oriented patterns, as well as identifying the top 20 customers in terms of total deposit amount. These are simple but classic questions that are perfectly suited for a data visualisation tool such as SAS Visual Analytics.

We will start with a simple cross-tab visualisation to display the aggregated deposit amount across the different Australian states:

when visulisation image 1

Oops, the problem around non-standardised state values means that this simple crosstab view is basically unusable. The fact that New South Wales (a state in Australia) is represented nine different ways in our STATE field presents a major problem whenever the state field is used for the purpose of aggregating a measure.

In addition, the fact that the source data only contain a full address field (FULL_ADDR) means that we are also unable to build the next level of geographical aggregation using city as it is embedded into the FULL_ADDR free form text field.

when visulisation image 2

It would be ideal if the FULL_ADDR was parsed out and street number, street name and city are all individual, standardised fields that can used as additional fields in a visualisation.

How about our top 20 customers list table?

when visulisation image 3

Whilst a list table sorted by deposit amount should easily give us what we need, a closer inspection of the list table reveals troubling signs that we have duplicated customers (with names and addresses typed slightly differently) in our customer table. A major problem that will prevent us from building a true top 20 customers list table unless we can match up all the duplicated customers confidently and work out what their true total deposits are with the bank.

All in all, you probably don’t want to share these visualisations with key executives using the dataset you were given by IT. The scariest thing is that these are the data quality issues that are very obvious to the analyst. Without a thorough data profiling process, other surprises may just be around the corner.

One of two things typically happens from here on. Organisations might find it too difficult and give up on the dataset, the report or the data visualisation tool all together. The second option typically involves investing significant cost and effort in hiring an army of programmers and data analysts in order to code their way out of their data quality problems. Something that is often done without detailed understanding of the true cost involved in building a scalable and maintainable data quality process.

There is however, a third and better way. In contrast to other niche visualisation vendors, SAS has always believed in the importance of high quality data in analytics and data visualisation. SAS offers mature and integrated Data Quality solutions within its comprehensive Data Management portfolio that can automate data cleansing routines, minimise the costs involved in delivering quality data, and ultimately unleash the true power of visualised data.

There is however, a third and better way.

Whilst incredibly powerful and flexible, our Data Quality Solution is also extremely easy to pick up by business users with minimum training and detailed knowledge around data cleansing techniques. Without the need to code or program, powerful data cleansing routines can be built and deployed in minutes.

I built a simple data quality process using our solution to illustrate how easy it is to identify and resolve data quality issues described in this example.

Here is the basic data quality routine I built using the SAS Data Management Studio. The data cleansing routine essentially involves a series of data quality nodes that resolve each of the data quality issues we identified above via pre-built data quality rules and a simple drag and drop user interface.

when visulisation image 4

For example, here is the configuration for the "Address standardisation" data quality node. All I had to do was define which locale to use (English Australia in this case), which input fields I want to standard (STATE, DQ_City) ), which data quality definitions to use (City - City/State and City) and what the output fields should be called (DQ_State_Std and DQ_City_Std). The other nodes take a similar approach to automatically parse the full address field, and match similar customers using their name and address to create a new cluster ID field called DQ_CL_ID (we’ll get to this in a minute)

when visulisation image 5

I then loaded the newly cleansed data into SAS Visual Analytics to try tackle the questions that I was tasked to answer in the first place.

The cross-tab now looks much better and I now know (for sure), the best performing state from a deposit amount point of view is New South Wales (now standardised as NSW), followed by Victoria and Queensland.

when visulisation image 6

As a bonus for getting clean, high quality address data, I am also now able to easily visualise the geo based measures on a map, down to the city level since we now have access to the parsed out, standardised city field! Interestingly, our customers are spread out quite evenly across the state of NSW, something I wasn’t expecting in the first place.

when visulisation image 7

As for the top 20 customer list table, I can now use the newly created cluster field called DQ_CL_ID to group similar customers together and add their total deposit to work out who my top 20 customers really are. As it turns out, a number of our customers have multiple deposit accounts with us and go straight to the top of the list when their various accounts are combined.

when visulisation image 8

I can now clearly see that Mr. Alan Davies is our number one customer with a combined deposit amount of $1,621,768, followed Mr. Philip McBride, both of which will get the special treatment they deserve whenever they are targeted for marketing campaigns.

All in all, I can now comfortably share my insights and visualisations with business stakeholders with the knowledge that any decision made are using sound, high quality data. And I was able to do all this with minimum support and all in a single day’s work!

Is your poor quality data holding you back in data visualisation projects? Interested in finding out more about SAS Data Quality solutions? Come join us at the Data Quality hands on workshop and discover how you can easily tame your data and unleash its true potential.

tags: data management, data quality, data visualisation, visual analytics
12月 142011
 

In using social media, I must admit I'm actually a late-comer. I've always wondered what can actually be said in 140 characters on Twitter. However, as I started using social media and embracing it, I've come to realise the power of social media as a communication platform and what the future holds as the technology becomes more mainstream.

As I slowly move up the social media learning curve (and it can be a steep one!) and move beyond tweeting from Twitter and updating my status from Facebook, I've also realised the flexibility such open platforms offer. The huge amount of innovation around social media means that there are now countless ways of working and interacting with social media platforms. In addition, new (and sometimes crazy!) ways of using social media platforms are being discovered and invented every day.

An example of the flexibility and options I am talking about played out as I was trying to send a social media message to my HUGE group of loyal followers across all the different social media platforms a few days ago. I was amazed (and troubled) by the number of considerations I had to make:

  • Who should I share it with? All, work related, family and friends, customers, specific subset of people, should I exclude certain people?
  • Do I make it public or private? Do I want to whole world to know about my message?
  • What platform should I use? Twitter, Linkedin, Google+, Facebook?
  • What tags (or hashtag) should I use with the message? If and what keywords do I want to emphasise in my message?
  • What account should I use to share it? Should I tweet as myself or use our corporate Twitter account?
  • What tools should I use to send the message? At last count, I have at least 20 tools/apps (across the different devices) I can use to send a tweet as I sit at my desk.
  • When should I share it? Let's face it, power tweeters don't stay up 24 hours tweeting to you! There are tonnes of tools to help you schedule or buffer your social media message to get your message across throughout the day!
  • Should I geo-tag the message? Do I want people to know where I am sending the message from?

Granted, not everyone goes through this many considerations when they are trying to tell the world what they had for breakfast, but as more people use social media platforms for content sharing and exerting influence, I think it will be the norm to go through many of the considerations I went through.

So what does that mean for organisations or brands who are trying to use Social Media to better understand their customers or prospects?

Simply put, every single one of the above factors is an important variable that reveals more about my intentions, influence and level of engagement beyond the 140 text characters in the message itself. If there is one thing I learnt about statistics, it's that it’s always good to capture as much information as you can! The ability to mine and better understand your customers increases as you take into consideration more of these signals.

Next time you read another 140 character Twitter or Facebook message, see what else you can work out about the author or the message itself!