Semantic Search + ChatGPT

ChatGPT writes. Hebbia reads. Why they’re a great pair.

Generative AI models seem poised to create trillions of dollars of economic value.

GPT-3, Stable Diffusion, DALL-E, and most recently, Chat-GPT have thrust AI’s ability to create into the mainstream.

But creations alone are often a long way from being helpful –Chat-GPT’s incorrect answers, and the immediate takedown of Meta’s Galactica share a glaring flaw:

Generative models can write responses, but they’re never reading sources or “citing” their work.

From OpenAI, while “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers… there’s currently no source of truth.” Meta mentions “there are no guarantees for truthful or reliable output from language models, even large ones trained on high-quality data like Galactica” and that one should “Never follow advice from a large language model without verification.”

For instance, @AndrewYNg’s first query renders ChatGPT factually incorrect:

Chat GPT “hallucinates” an incorrect response. GPUs are actually faster than CPUs for ML.

The “search” functionality of ChatGPT does not actually search — answers are based on short term pattern matching and often are incorrect.

To realize generative AI’s full promise for applicability in the workforce, more transparency, trustworthy, and reliably accurate systems are needed. Technology that can correctly feed these models relevant facts is paramount.

Semantic Search — the ability for LLMs to read and retrieve sources is key.

In addition to their generative ability, LLMs have a unique ability to also encode and index content by its meaning.

Semantic or “neural” search engines leverage AI to retrieve answers based on meaning, not only keywords (i.e. Google, ElasticSearch). An analyst searching for “company cultural values” would expect to land on the company’s mission and vision in a 10-K. Only a semantic search engine like Hebbia could support that behavior. Keyword search might yield results about company valuations.

Hebbia finds correct information *within* primary sources.

These discriminative AI systems, unlike generative AI, find the signal in the noise.

Generative models can, and should leverage a semantic search “memory”.

Semantic search “readers” can feed generative “writers” to address all shortcomings of generative models alone:

More accurate: Priming models with relevant primary sources
More trustworthy: Cite sources behind every generation
More easily updated: Updating an index can happen almost instantaneously with no need to retrain a billion (or two trillion!) parameter model ad hoc.