Leveraging multiple data sources to augment ad hoc and exploratory analytics and decisions
A data lake is a collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact,
copy of the source format.
The purpose of a data lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (source: Gartner). Data lakes conceptually stem from the rise of unstructured and horizontally scaled database technologies (e.g., Hadoop Big Data) and have increasingly become the basis of much of the quickly growing science of artificial intelligence and cognitive computing.
This technology blueprint focuses on data exploration and governance by delivering a truly scalable, automated and repeatable process for identifying sensitive data,
capturing data lineage, and ensuring proper data use and access. In most cases a data lake infrastructure better aligns richer and deeper data volumes and faster velocity to predictive and automated decision-making. Whilecertainly much of the data collected may not be predictive to every decision in a customer lifecycle, adding, for example, social media to a marketing analytic or realtime travel data to a fraud analytic will certainly enhance the use cases, sometimes even dramatically.