In this era of information-driven world, data has become an indispensable part of our daily life. With the combination of cloud computing, the Internet and mobile devices having become an integral part of our lives and activities, enormous data is generated every day (Hima Bindu et al, 2016). For example, huge data is generated every day through social networking applications such as YouTube, Twitter, Facebook, LinkedIn, and WhatsApp, to name a few. The amount of data generated is growing exponentially, and estimates suggest that at least 2.5 quintillion bytes (that's 2.5 followed by a staggering 18 zeros!) of data are produced every day (Harish Kumar and Menakadevi, 2017). More data is now stored every second than there was on the entire Internet 20 years ago (McAfee and Brynjolfsson, 2012). These collections of datasets that are large and complex and become difficult to manage by traditional relational database management systems have led to the term Big Data (Shirudkar and Motwani, 2015). Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an original essayThis term is now used everywhere in our daily lives. Big Data (BD) is becoming increasingly popular as the number of devices connected to the so-called Internet of Things (IoT) is still increasing at unanticipated levels, producing large volumes of data that need to be transformed into valuable information (Moura and SerrГЈo, 2015 ). Furthermore, the advent of BD has brought about new challenges in terms of data security (Toshniwal et al, 2015). According to Toshniwal et al (2015), there is a growing need to research technologies that can handle these large data sets and secure them efficiently. They go on to further reiterate that current data protection technologies are slow when applied to huge amounts of data (Toshniwal et al, 2015, p. 17). This means that security is of great importance when it comes to BD collection, processing and analysis, the systems used should be faster but secure. Ultimately, the purpose of BD security is no different from the CIA's fundamental triad of confidentiality, integrity and availability of the generated data that must be preserved. According to Tahboub and Saleh (2014), the need to protect information that represents a valuable asset of the organization cannot be emphasized enough. Data Loss Prevention (DLP) has been found to be one of the effective ways to prevent data loss. DLP solutions detect and prevent unauthorized attempts to copy or send sensitive data, whether intentionally or/and unintentionally, without authorization, by individuals authorized to access sensitive information. DLP is designed to detect potential data breach incidents in a timely manner and this is done by monitoring data while it is in use (endpoint actions) or in motion (network traffic) or at rest (data storage) (Tahboub and Saleh, 2014 ). BD process protection includes protection of knowledge sources, preprocessing, and results. According to ISACA (2010), DLP aims to halt the leakage of sensitive information occurring in businesses globally. By focusing on locating, classifying, and tracking information at rest, in use, and in motion, DLP is responsible for helping businesses manage the information they hold and stop the many information leaks that occur every day (ISACA, 2010). This research aims to design a method to help organizations prevent data leakage in big data. DLP, 2014).
tags