Common SQL, Open Lakehouses, and Open Source Managed Services: What Will Be Cool in 2023
A closer look at three trends that will help enterprises get the most from their data.
- By Steven Mih
- December 16, 2022
Over the last year we've seen new phases of growth in the cloud from architecting for performance, open data lakes augmenting the cloud data warehouse, and out-of-the-box cloud solutions driving innovation. In 2023, we’ll see end user experience becoming a top priority as deep integrations of data platforms become standard, the emergence of an industry-accepted open lakehouse stack, and open source SaaS shifting towards open source managed services.
Trend #1: A common SQL dialect will emerge
As companies grow, they continue to grow their data and look to make more data-driven decisions to become more competitive. Most organizations rely upon a handful of database management systems for batch and interactive analytics workloads on historical, near real-time, and streaming data.
Although SQL is the lingua franca for understanding their data, the SQL versions used are all dialects of one another. The user needs to know which system they’re using and write their SQL code accordingly. An analyst may write SQL that tells a particular data system the specific crucial steps to get an insight from their data. When a user wants to use a different data management system, that code needs to be tested and ported to the SQL dialect of that other system. This can happen even if both systems are compatible with the ANSI SQL standard.
For example, if an analyst is using a tool for interactive analysis and then moves that code into “production” on a different system for a batch use case, the code needs testing as there may be underlying semantic differences. As data platform integration improves, there will be ways to leverage a common SQL dialect to code against, and that can also be run on different systems with the same expected results. This requires the data platform as a whole to be abstracted to the end user. Instead of worrying about the underlying engines and data structures, end users will be able to seamlessly leverage, in a declarative manner, powerful underlying engines for interactive, batch, real-time, streaming, and ML workloads.
Pro tip: Determine when and where your SQL is first written by your data users and try to find platform integrations to make that first SQL dialect the standard. For most organizations, that will be at the exploratory or insight discovery stage.
Trend #2: Customers will look for the most open lakehouse stacks
With the macroeconomic outlook for 2023 looking bleak, more companies will try to embrace powerful open alternatives to reduce lock-in. With every piece of the software stack, the industry has seen open source alternatives emerge to challenge proprietary incumbents. We’ve all seen how Linux started out small and ended up becoming the foundation for the most widely used OS in the world. This is happening in the new lakehouse area: there are proprietary vendors offering their version of the lakehouse and there are open source alternatives.
Most of the open source alternatives come from hyperscalers such as Meta, Google, Uber, and others who needed to build data-driven systems to suit their organizations’ unprecedented speed and scale. With the open source lakehouse, there’s an alphabet soup of projects available for you to choose from, all with varying degrees of openness. There are open options for file formats, table formats, computing engines, access control, and system interfaces. You can choose from the options and stitch them together, but the seams will still show. In the coming year, you’ll start to see the industry coalesce around three or four lakehouse stacks. This is good news for the companies who want a simple, powerful, yet open lakehouse platform.
Pro tip: Determine what level of openness you can accept across your stack. Because there are varying degrees of open technologies to choose from, look to where infrastructure spending will be. In most cases that will be based on your CPU requirements, not your storage. If that’s true for you, find ways to ensure that part of your spending has the least lock-in. Choosing vendor-neutral open source projects is a good place to start.
Trend #3: Adoption of open source will shift from SaaS to open source managed services
Vendors of open source data ecosystem projects have moved to provide more than JAR files and other software bits. Many have added SaaS offerings which may be easy to use but inherently make it difficult to extricate your data, which can be a risk to your company. As IT departments demand more control of their data and reduce copies in other third-party SaaS systems, we’ll see the adoption of more cloud-native managed services instead of full SaaS solutions. To delineate this further for a distributed system like a lakehouse, you can ask about the responsibilities associated with three “planes”: control plane, compute plane, and storage plane.
Pro tip: Look for open source managed services that keep your data in your VPC for both data storage, compute, and user access. If your end-users are getting data from the vendor portal there is a higher data extrication risk.
About the Author
Steven Mih is the co-founder and CEO of Ahana. Steven brings over 20 years of experience in sales, business development, and marketing of enterprise technology solutions to Ahana. In addition to his role as CEO, Steven is a Presto Foundation board member. Prior to Ahana, Steven was the former CEO of Alluxio and Aviatrix. His multifaceted go-to-market experience spans leading additional organizations including Couchbase, Transitive, and Cadence Design Systems. Steven started his career as a field sales engineer at AMD. He holds a B.S. in electrical engineering from UC San Diego. You can contact the author via LinkedIn or Twitter.