Meet the Experts
Merv Adrian, Gartner
Wake Up. Your Data Lake is the Biggest Threat to the Security of Your Data
This isn't meant to be a knock against information security teams in most organizations, but a realization that businesses are moving faster than the security teams can keep up these days. Expertise on data lakes from a security perspective may be nascent in an organization; more likely it doesn’t exist at all. In order to tackle the issue of how to make data lakes more secure, a mentality of having to watch out for yourself is needed. There may not be a lifeguard around to help. If your organization is going to leverage data lakes, then everyone needs to be able to swim.
The traditional risks managed in data bases and data warehouses are not just limited to confidentiality, integrity and availability. Due to the type of data and the capabilities the data lakes support, the data may also affect risks related to privacy, safety and reliability. So even there new issues are arising, but there is a platform from which to address them. The problem with data lakes is that they use much less mature data stores – even file systems. Without remediation, they will leak on all sides. A few of the issues:
- The complexity of a data lake typically leads to a poorly managed environment and a larger attack surface. For example,
- Hadoop has a lot of components that introduces complexity. Which components can be patched and when? When and in what sequence?
- Even business-critical systems tend to be made available for patching very infrequently. In a data lake, common management for versions, patches, configurations is likely absent.
- The security team's existing vulnerability scanning tools may not be able to monitor data lakes
- Use of open source versions, e.g., Hadoop, instead of commercial distribution may delay security patches.
- Privileged administration at the physical and platform layer is not centrally managed resulting in inappropriate access (e.g., accounts and rights are never removed)
- Poor change management can lead to availability issues, and maybe even introduce new vulnerabilities (or unknown vulnerabilities) into a data lake
- Lack of event logs or centrally collecting logs means there is no visibility at worse, especially if there is an incident that needs to be investigated, or if generated, but not centrally collected, then the integrity and availability of the logs could become an issue.
- Ultimately, breaches WILL occur. Few data lakes have detection or remediations strategies in place.