Choosing between public and private LLMs

Large language models (LLMs) continue to command a blazing bright spotlight, as the debut of ChatGPT captured the world’s imagination and made generative AI the most widely discussed technology in recent memory (apologies, metaverse). ChatGPT catapulted public LLMs onto the stage, and its iterations continue to rev up excitement—and more than a little apprehension—about the possibilities of generating content, or code, with little more than a few prompts.

While individuals and smaller businesses consider how to brace for, and benefit from, the ubiquitous disruption that generative AI and LLMs promise, enterprises have concerns and a crucial decision to make all their own. Should enterprises opt to leverage a public LLM such as ChatGPT, or their own private one?

Public vs. private training data

ChatGPT is a public LLM, trained on vast troves of publicly available online data. By processing vast quantities of data sourced from far and wide, public LLMs offer mostly accurate—and frequently impressive—results for just about any query or content creation task a user puts to it. Those results are also constantly improving via machine learning processes.

Even so, pulling source data from the wild internet means that public LLM results can sometimes be wildly off base, and dangerously so. The potential for generative AI “hallucinations,” where the technology simply says things that aren’t true, requires users to be savvy. Enterprises, in particular, need to recognize that using public LLMs could lead employees astray, resulting in severe operational issues or even legal consequences.

As a contrasting option, enterprises can create private LLMs that they own themselves and train on their own private data. The resulting generative AI applications offer less breadth, but a greater depth and accuracy of specific knowledge, speaking to the enterprise’s particular areas of expertise.

Challenges posed by public LLMs

For many enterprises, unique data is an invaluable currency that sets them apart. Enterprises are, therefore, extremely (and rightfully) concerned about the risk that their own employees could expose sensitive corporate or customer data by submitting that data to ChatGPT or another public LLM.

