Although there has been much speculation over this statistic’s origin, most industry experts agree that 80% to 90% of the world’s data is unstructured data, and about 90% of it has been produced over the last two years alone. Of these unthinkably vast stores, only 0.5% is effectively analyzed and used today.
In the business world, most unstructured data lies in customer-related text, which is abundant and available. However, most organizations don’t know how to efficiently extract predictive elements from unstructured customer data. They’re not sure how to reap the value of these insights by using them to boost the performance of predictive analytics, and make better operational customer decisions. But, done right, extracting valuable predictive insights from huge quantities of text takes just seconds.
The Future Is UnstructuredThe tech industry is full of predictions, but in this one, I have high confidence: The future is unstructured –– because unstructured data holds the key to the next generation of intelligent systems, which will be largely based on cognitive analytics and artificial intelligence (AI)-based applications.
How do I know? I’ve spent most of my career as a data scientist at FICO, an analytics company perhaps best known for the FICO® Score, and software solutions that help financial institutions to optimize credit originations and fight financial fraud.
All of these solutions, which are pervasive in the banking world, have structured data as their foundation. But for the past several years, I’ve been exploring new frontiers for FICO, developing novel intelligence analytics across complex networks—the forefront of the analytics revolution.
Specifically, my active research includes knowledge discovery and behavior prediction using supervised and unsupervised machine learning methods, as well as relational learning and network probabilistic inference methods. For those of you who aren’t data scientists that translates into using unstructured data analysis and relational learning to drive predictive analytics solutions for applications as diverse as:
- Credit originations
- Fraud, waste and abuse detection
- Insurance prediction and modeling
- Risk stratification and minimization
- Anticipating of resource demands
- Improving marketing campaigns and customer retention management strategies.
What Is Text Analytics?Text analytics helps to deliver fresh intelligence by mining a major category of unstructured data, the massive stores of customer data many organizations currently have on hand. Here are several important benefits:
- Discovery through machine learning: Text in email messages, call center logs, CRM applications and collection agent notes is readily understood by people, but it’s meaningless to traditional predictive models. Machine learning methods, however, can determine what textual data is about and classify it for further analysis. These methods can discover customer characteristics and transform them into structured numerical inputs that, in turn, can be used in predictive models and traditional analytic algorithms.
- Handling complexity: Unstructured and semi-structured text (e.g.., XML files, Excel spreadsheets, weblogs) is inherently complex since it may contain a wide range of content on a broad array of topics. Moreover, the potential value of text analysis is often increased by combining text of different types from multiple sources. Such a comprehensive approach may reveal complex, subtle customer behavior patterns not evident in smaller, more homogeneous document sets.But the task of collating, regularizing and organizing diverse data would be impossible without today’s advanced technologies. We now have the analytic techniques and data infrastructures to essentially merge disparate varieties of text into one “document” for analysis.
- Facilitating engineering, deployment, management and regulatory compliance: While text and the process of analyzing it can be quite complex, the results need to be simple to understand and use. Today we can bring new insights from text analysis into predictive scorecards, for example, maintaining all the advantages they provide.
In my next blog I’ll take a deeper dive into how text analytics are used to boost the predictive power of scorecard models. In the meantime, if you’d like to learn more about FICO’s latest research in unstructured data, read my scientific paper published by IEEE, “Mining and Visualizing Associations of Concepts on a Large-scale Unstructured Data.” Follow me on Twitter @odriollet … and ¡viva la revolución analítica!