Three Attack Vectors That Target Your Data
As you build your data science program, don't ignore that there are people working to disrupt your plans. Here are three attack vectors they can use and how to protect your data.
- By Troy Hiltbrand
- February 7, 2020
As companies implement data science programs, their target is often to use technology and advanced analytics to create a competitive advantage. At the same time, there are others in the market who want to prevent their success. This list of adversaries can range from industry competitors working in the same market segment to bad actors and enemy states whose goals are to disrupt the economy and society in general. No matter their intentions, these opponents desire to stop a company's progress.
There are three major attack vectors used to disrupt a company's forward progress with its data science program -- both external and internal data poisoning as well as intellectual property theft.
External Data Poisoning
External data poisoning is the safest to execute because it can be executed from outside the boundaries of an organization.
In 2016, Microsoft deployed a state-of-the-art AI Twitter chatbot named Tay as an experiment in conversational understanding. The idea was to have the chatbot learn through interactions with the public and evolve over time to become smarter and more capable through playful communication. This experiment went horribly wrong as the Twitter community fed Tay with garbage and the result was a chatbot spewing misogynistic and racist banter. Microsoft quickly deactivated Tay and cleaned up the account.
This is a prime example of the concept of garbage in, garbage out. The model learned from a set of input data that was culturally corrupt and generated results that reflected this. A learning model is only as good as the data it is fed. This opens up an attack vector for your learning models. If a a bad actor has an open interface to push data into your model, then it is susceptible to external data poisoning.
Models that use real-time data to evolve are most susceptible to this method of attack, and the inputs must be monitored and cleansed to ensure they do not have a negative impact on the model. Models that use a predefined training set of data at the time of construction are also susceptible, and care should be taken to cleanse the training data before starting modeling exercises.
Models most susceptible to this form of attack are those that are very sensitive to outliers in the data. These at-risk models include linear classifiers -- such as logistic regression and naive Bayes classifiers -- support vector machines, decision trees, boosted trees, random forest, neural networks, and nearest neighbor. Due to the nature of how these models are trained, the introduction of outliers can significantly skew their results.
Solution:
-- Implement a data quality program to ensure data inputs are valid, specifically focusing on outliers
-- Continuously scan and monitor your input training data
Internal Data Poisoning
The second attack vector is internal data poisoning. This entails having a bad actor breach your perimeter security, directly access your corporate data, and alter the data to the point that it has a detrimental impact on your data science models.
When people start speaking of internal data and system manipulation, your mind might go to the 1995 classic Angelina Jolie movie Hackers where the movie takes the audience flying through its representation of an internal network as the hackers strive to interrupt a company's computer operations. In reality, external network penetration is much less sexy than portrayed in the movie, but it can have significant negative consequences to business operations.
Network security teams across the world can attest to the fact that their networks are continually being scanned for gaps in security. When gaps are identified, bad actors exploit them to access resources that are intended to be protected -- including your historical data sets. If hackers can poison this data by alterating it at the source, the derived models will produce weaker results or even become completely misleading and steer your company in the wrong direction.
To protect against this, network teams need to ensure that up-to-date systems are installed to secure the perimeter and they are patched to eliminate recently identified security holes and zero-day exploits.
Direct attacks to permeate through the perimeter of the network are not the only option bad actors have at their disposal. We see an increase in phishing and even targeted spear-phishing, where misleading email messages result in an employee installing malicious software on their computer. If the employee has the appropriate level of data access, the software can silently corrupt that data without the user even knowing it.
Malicious software can also prevent access to the data altogether. We have seen a rise in the use of ransomware, which encrypts data and demands payment in exchange for the key to decrypt the data. If this happens on your corporate network, you will be locked out of your data, the fuel for your data science program. To protect yourself , it is important your desktop and mobile devices have current and effective antivirus software. It is also important to train your employees to recognize and respond appropriately to phishing attempts.
Solution:
-- Implement modern perimeter security and desktop management solutions and keep their software patched and up to date
-- Train your employees to recognize and respond to phishing attacks
Model Theft
The third and final attack vector is the theft of your models. Unlike the theft of physical assets, the theft of models does not leave your company without a model to work with. It does, however, create a copy that can be studied and replicated by a competitor, eliminating the competitive advantage your company is striving for with its data science program. To protect against this, companies need to apply robust access controls to their models. This includes central storage and management of these models with user- and role-based protection to prevent unauthorized individuals from accessing them.
As is the case with any intellectual property, one of the most effective methods of stealing models is for a competitor to hire away the staff who built them. If they can hire your knowledge base away, they may be able to replicate that intellectual property and rob you of its future evolution and benefits.
The best way to protect your enterprise is through effective human resource processes. This includes having effective intellectual property agreements with your employees that prevent them from disclosing company secrets if they are lured away. Another method to prevent this type of intellectual migration is to ensure your employees have a high level of job satisfaction. This can include strategically planning the compensation, benefits, and culture you offer. Strong company culture and employee loyalty can be one of the best protections against the attack vector of intellectual property theft.
Solution:
-- Implement central model storage and role-based access controls to protect the models
-- Build in contractual intellectual property controls to employment contracts
-- Implement human resource programs to promote active and healthy employee engagement to minimize employee attrition to other companies
A Final Word
Although your data science program is heavily focused on identifying ways to use data to generate a competitive advantage, you need to be aware that others are looking for ways to disrupt your success. Whether your enemies are competitors or external bad actors, you can take steps to protect yourself and ensure your work has a lasting impact. If you focus on these attack vectors and work jointly with other departments, you can reduce the probability of an attack being successful and minimize the impact when an attack happens.