5 NoSQL Alternatives for Data Storage
Organizational needs are changing, and data professionals are rethinking what they really need from an analytics database. With a whole new group of leading NoSQL databases, there are alternatives for storing and managing data.
- By Troy Hiltbrand
- July 19, 2016
In the world of data warehousing and analytics, relational database management systems (RDBMSes) have reigned supreme for years. This is due in part to their adherence to the ACID properties: atomicity, consistency, isolation, and durability. Application of these characteristics has guaranteed data professionals that transactions in the database are solid and reliable.
The challenge is that as data scales up these characteristics start to become a hindrance. Scaling requires increased complexity in the software architecture to ensure that all four properties of the ACID model remain intact as the machines scale vertically.
At times, an organization wants to scale horizontally by adding more small-scale machines where the load can be more effectively distributed. Horizontal scaling allows organizations to increment more frequently in small steps -- instead of waiting for large changes such as a new platform. As organizations consider changing their architecture in an effort to leverage the cloud and the proliferation of small duplicable servers, they start to question whether ACID is as important as it once was in the context of analytics.
New Options
As data needs change, there are new software technology options available to store and manage data and provide this horizontal scaling. Many of these database engines have been lumped into a category called NoSQL. NoSQL can be a misleading term; it often causes people to believe that it is a replacement for SQL (or the RDBMS).
In fact, NoSQL stands for "not only SQL," and it augments the analytics professional's toolbox rather than replacing existing technology. In other words, NoSQL relies on data formatted in different ways and is not limited to SQL as the mechanism for retrieving data.
Unlike a RDBMS, NoSQL databases do not always implement all parts of the ACID model. Many of them adhere to a principle known as BASE (basically available, soft state, and eventually consistent). The system allows for horizontal scaling across multiple servers that work in the background to ensure that each instance is eventually consistent, but there is no guarantee that every instance is a mirror copy of the others at any given moment.
Knowing the Trends
In analytics processing and data aggregation, there are times when this eventual consistency provides sufficient accuracy to make decisions. When doing long-term analysis, knowing the direction and magnitude of trends is more important than ensuring that every transaction is intact.
Don't misunderstand -- these systems are not so loose that there are huge data holes and inconsistencies. It is simply that for these databases performance and high availability are the priorities, and real-time consistency between any two instances in the architecture is secondary, which leads to the principle of eventual consistency.
Even as architectures change and data analytics evolves, the RDBMS continues to be a staple, but analytics professionals need to understand these NoSQL databases that can round out their toolbox. Below is a breakdown of some of the most popular NoSQL options.
MongoDB: Mongo is a document-store database designed to rapidly develop Web applications and Internet infrastructure. Its model and persistence mechanisms are built for high read-and-write throughput and to scale easily with automatic failover. MongoDB does not rely on relationships to manage the data but instead stores JSON documents.
These JSON documents can have embedded structure and the attributes of data can be queried from within the structure. Unlike relational databases, which have structure on definition, document store databases such as MongoDB embed the structure in the documents and rely on the front end to store the structure.
This establishes a structure-on-query paradigm and allows for diverse documents to be stored together and the front end to determine what gets queried on read. Thus, MongoDB is flexible, capable of organic growth, and useful in situations when the data structure is not completely known at the beginning of the analytics process.
Redis: Redis is an in-memory key-value store. Each record is a combination of a key and a value, similar to a hash table or a table with two columns (ID and value). Because it stores the key in memory, Redis is very fast at retrieving data based on a specified key.
This data can be simple, as in the case of a single data value associated with a single key, or the key can point to a complex structure that is retrieved in its entirety and parsed by the client. A common use of Redis is to store post-processed and aggregated data that needs to be recalled very quickly.
ElasticSearch: ElasticSearch was developed to optimize search capabilities within a database. JSON documents are stored in ElasticSearch and indexed for fast retrieval. ElasticSearch is built on top of Lucene, making it very flexible for finding data using pattern matching and combinations of searches. When search across attributes and within attributes is important, ElasticSearch can provide a database/search engine hybrid.
Cassandra: Cassandra is a decentralized and distributed column store database. Created at Facebook and built on a combination of Google's BigData architecture and Amazon's Dynamo distribution design, Cassandra is built for scalability. Instead of storing rows together, Cassandra focuses on columns of data and optimizing the storage of a column.
Its model allows for sparse data, meaning that not every row will have every column. If a row does not have a column, no data is stored, making data storage compact and efficient. This makes it ideal for data that has many attributes where some instances have values for these attributes and others do not.
Neo4J: Neo4J is a memory-bound NoSQL database based on relationship data. Data is entered as nodes with attributes and relationships, also called edges, between those nodes. This allows for complex network analysis using Cypher Query Language (CQL). If data relationships are important in your data, Neo4J can be a strong platform.
A Final Word
NoSQL does not have to be a replacement in all cases for your analytics data storage needs, but understanding these technologies is important. As your data scales and your needs change, these five popular NoSQL databases provide options to augment your current RDBMS platform.
About the Author
Troy Hiltbrand is the senior vice president of digital product management and analytics at Partner.co where he is responsible for its enterprise analytics and digital product strategy. You can reach the author via email.