According to the latest Gartner Magic Quadrant for Data Quality, DataFlux (a wholly owned subsidiary of SAS) continues to be the market leader when it comes to Data Quality. I say “Continue” because DataFlux has been in that same leadership position for the last 3 years!
Why is this important and what does it mean for our customers? Well, to say that Data Quality is the foundation of everything BI or Analytics related is more or less a given and old news these days. The challenges around delivering trusted data to downstream reporting or analytical processes have always been there and are becoming more critical as the need for more accurate reporting and timely insight intensifies. What’s more, many organisations are also undertaking new initiatives that all needs to be built on the foundation of sound Data Quality processes and capabilities.
1. Data Governance
Organisations that recognise the importance of data as a strategic asset and a key competitive advantage almost inevitably kick off “Data Governance” projects or initiatives. Sometimes it's because off the back of good, proactive executive leadership, sometimes off the back of looming regulatory and compliance needs. While a sound data governance program should involved the combination of people, process, methodology, and technology, Data Quality should form the foundational component of any technology discussion. A good, robust data governance framework is critical for overall long-term success but should also deliver at least some key outcomes in the form of specific Data Quality metrics that data stewards can work with.
2. Customer Centricity
As industries and markets mature, the ability to win, keep and profit from your existing customers becomes more important than ever. Customer centricity is often quoted as the number 1 strategic initiatives by CIOs today. It doesn’t take a genius to work out that customer centricity requires an organisation to obtain a single consolidated view of their customer across multiple business and product lines. Are some organisations there? Yes, but dare I say most organisations are not. Data Quality and the ability to consolidate entities are the foundation to customer centricity, a key reason why they often form the foundation of any Master Data Management solutions.
3. Application migration/consolidation
This one is often forced upon the IT team, but unfortunately it is more than likely happening in your organisation as we speak. The fact that the these projects often have an absolute deadline that can not be moved means they're often high risk. All too often, the systems that need to be moved or consolidated often have little documentation (that is if there are still people who know the system at all!). Managing risk requires the organisation to understand what information is actually in these systems. Data Quality offers a way to automate the process of cleansing the data to be loaded; not only does only correct data get loaded but projects can also be delivered on time.
At SAS, we believe that Data Quality is critical in helping our customers solve complex problems and challenges such as the ones mentioned above. It's one of the reasons why it's a foundation piece of our extensive data management portfolio. It's great that Gartner agrees that we have the best tool in the market, but it's about more than that. We also have the right people with the right know-how, something that's critical in helping customers solve their most complex business problems.
Seth Grimes posted a fantastic article on Text Data Quality yesterday. A must read for anyone in this space. The article points to some of the text quality issues I have mentioned in my last two blogs. Text is in a league of its own when it comes to data quality. And the more you have to work with social media generated data, the more you will run into non-standard text and the need for text cleansing. I presented a workshop where I talked about "The Ten Transgressions of Text" at the Text Analytics Summit in June:
4. Shrt-hnd or clipped text (e.g. hmm tink nid >2 twitter acs; els msgs all jumbled up btwn personal & thots! dilemma!)
6. !!NOISY TEXT!!
8. ♪ Voice ♫
9. Email / Attachments
10. Poor grammar
Customers ask me if we can automatically remove profanity from documents and, yes, WE CAN!
My interest in the sorts of shortened/clipped texts that you get in text messages or via Twitter is huge. There is a lot text analytics users and vendors can do to work with this data. Terms like "cul8r" (see you later), or "LOL" (laughing out loud / lots of love) could be expanded into their intended forms, mapped to other synonyms(we provide ontologies to handle this), or left as is. When a shortened term can mean several different things depending on context, that's when the linguistics can help. I see a big need for including this new 'language' into standard language dictionaries.
Adhering to the standard rules of grammar looks like a thing of the past. As traditional print media loses favor, so will standard grammar in social media (blogs, micro-blogs such as FaceBook, Twitter, Bebo etc.). I'm excited to see how other natural language processing technologies will change to accommodate the new breed of user.