AI is proliferating, prolifically. Every enterprise vendor worth its salt (and probably of the ones that aren’t) has announced some or other AI integration, upgrade or service, with the vast majority also embracing the comparatively new world of generative Artificial Intelligence (gen- AI) with its innate ability to actually generate and not just predict.
But there’s a problem. As smart as AI is, that’s all it is i.e. it is only a factor of the data we expose it to, the information that we allow it to ingest and the degree to which we build AI engines with contextual understanding and algorithmic excellence. What this means is that while organizations are adding an AI assistant based on gen-AI Large Language Models (LLMs), most LLMs don’t actually understand what’s going on inside an enterprise. Why would they? They stem from wider external knowledge pools created in the open data universe across the open source fabric, so it would be unreasonable to imagine that they might understand every organization’s unique datasets, jargon and internal knowledge.
An LLM trained on the web, across the cloud and within the open data arena might have lots of broad knowledge drawn from (mostly) publicly available resources, but we need to remember that the definition of a ‘customer’ or ‘fiscal year’ or another key operational phrase always varies across companies.
Firing up the knowledge engine
Data and AI company Databricks says its new LakehouseIQ service solves those problems. It’s less of an assistant and more of a ‘knowledge engine’ that learns what makes an organization’s data, culture and operations unique. It uses generative AI to understand jargon, data usage patterns, organizational structures etc. to answer questions accurately, within the context of a business. It is, if you will, AI that works with cemented concrete knowledge connected and forged to a business-specific use case.
Databricks is adamant that any employee organization can use LakehouseIQ to search, understand and query data in natural language. LakehouseIQ is integrated with Databricks Unity Catalog to help ensure that democratizing access to data adheres to internal security and governance rules.
“LakehouseIQ will help democratize data access for every company to improve better decision-making and accelerate innovation. With LakehouseIQ, an employee can simply write a question and find the data they need for a project, or get answers to questions relevant to their company’s operations. It removes the roadblocks inherent in traditional data tools and doesn’t require programming skills,” said Ali Ghodsi, co-founder and CEO at Databricks. “Every employee knows the questions to ask to improve their day-to-day work and ultimately their business. With LakehouseIQ, they have the power to quickly and accurately discover the answers.”
We know that when employees need access to internal data to help complete their tasks, many may find it tough to get what they need in terms to perform timely analytics. LakehouseIQ is said to ‘significantly enhance’ Databricks’ in-product Search function. The company says that its new search engine doesn’t just find data, it interprets, aligns and presents it in an actionable, contextual format.
LLM: No hablo jargon & acronyms
“Whether the CEO is trying to build quarterly sales forecasts or a marketer is attempting to analyze campaign performance, knowledge workers rely on a small team of over-worked data scientists and programmers to find and query the relevant data sets. This bottleneck prevents businesses from truly embracing data and AI. Large Language Models (LLMs) promised to fix this problem, but so far, the results have been disappointing,” proposes Ghodsi and team. “General purpose models don’t understand the unique language of every business: they cannot process jargon or internal acronyms; they are not trained on the company’s unique data sets; and they do not understand organizational charts or know which teams should have access to what information.”
LakehouseIQ learns from these (above-referenced business-unique) signals within an organization using schemas, documents, queries, popularity rating measures, lineage, data science notebooks (not the laptop kind) and Business intelligence (BI) dashboards to become cumulatively smarter as it answers more queries.
Because this technology understands the specifics of an organization’s own business jargon in the context of where it is used (in terms of which applications and which digital services it exists in) it can interpret the intent of the question. Databricks further claims that it is capable of generating additional insights that could spur new questions or lines of thinking. LakehouseIQ does all of the while being governed by Unity Catalog, Databricks’ own solution for unified search and governance across data, analytics and AI.
“LakehouseIQ solves two of the biggest challenges that businesses face in using AI: getting employees the right data while staying compliant and keeping data private when it should be,” said CEO Ghodsi. “Organizations can be confident that their employees will only have access to the data they are authorized to use, so increasing data accessibility doesn’t increase risk. It alleviates time-strapped engineers, eases the burden of data management, and empowers employees to take advantage of the AI revolution without jeopardizing the company’s proprietary information.”
Lakehouse expansion
Databricks notes that it also continues to expand its Lakehouse Platform, recently announcing Lakehouse Apps and its Databricks Marketplace, plus a suite of data-centric AI tools for building and governing LLMs on the lakehouse.
Looking at the context-based specifics of the company’s platform progression (as Databricks would surely insist that we do, based upon its adherence to context-based specifics when it comes to AI), there is a clear move here to provide organizations with some (almost simple sounding) extra tooling that in fact comes from deeply intelligent planning at the software architecture level.
As we now build the walls of digital business with AI from sources such as Databricks, we will need a new form of cemented concrete to bind AI to more carefully precision-engineered workflow tasks. Whether it’s gypsum, lime, silica, alumina and iron oxide or it is context-specific AI to surpass and augment LLM generalization, the mixer is on.