Chat GPT took the world by storm in 2023, becoming the fastest growing consumer application of all time. At the same time, OpenAI inspired thousands of people to start building "AI" applications. The technology is no doubt powerful, but in developing real-world use cases it seems like most people are stuck at chat. So, how do you go about adding AI to your application beyond just implementing a chat widget?
In this post, I'm going to give you an easy way to think about adding AI to your existing applications that you can start implementing this week.
Nothing! If your user base is already used to interacting with you over chat, then that might be exactly the user interface (UI) you need. Most of my customers though have spent years refining beautiful and productive interfaces tailored to the specific needs and workflows of their users. It would be a shame to think that the only option for using AI is to slap a chatbot on top of that UI.
It turns out that Large Language Models (LLMs) are pretty great at taking unstructured inputs and returning structured outputs. Let's take the example of accepting an address from a user. The common parts of an address might be:
In a web form, you'd typically see an input box for each part of the address. This pattern works fine, but what if you were building a voice application instead? One method would have you asking for each part of the address individually. "What is your street address?" "Is there a unit or apartment number?" "What city is that in?" "What state is that in". It quickly becomes very painful for the user. Wouldn't it be better to just let them say "1234 Main Street Phoenix, AZ 85016"? Yes!
In my 10 years of experience running voice applications with the general public, I've learned that nobody actually speaks their address in this manner. What you actually get it something like: "Hold on, what's the address here? The address is 1234 ummmm Main Street in Phoenix the zip is 85012". Good luck string parsing your way out of that. Enter the LLM:
Now let's go back to a web example again. How many times have you gone to the FedEx or UPS store with an addressed package only for them to have to manually type the address into their system? What if they could just take a picture of an address and have it automatically parsed? We can do that too!
That's just a single example in an ocean of possibilities. Look at your application and find the areas where you are forcing your users to do the work of structuring their inputs so that you can process information. I bet you can find a handful of places where your users could dramatically increase their data entry speed by allowing them to simply provide the data in an unstructured way. New hire needs to be onboarded? Simply snapshot their resume and push the information into your HR portal. Received a paper invoice in the mail? Scan it and send it into your accounting system. Spent your afternoon white boarding the next quarters release? Take a picture and create your tasks in your project management system immediately. Rather than splitting up an input into a dozen or more form input fields, maybe just use a single text area. The opportunities are limitless.
We've released an open source library for .NET developers to make the process of using OpenAI to extract structured data from images extremely fast and easy. Point your developers to our GitHub Repository to get them started. Read more about using OpenAI and C# for image understanding in this blog post.
Of course, if you're looking for opportunities to add AI to your existing applications and you'd like some advice, we'd be more than happy to take a look and give you some suggestions too. Schedule a free consultation today!