AI-integrated
Search

Case Study

Role: UX Researcher
Date: 2024

This project centered on analyzing the integration of an AI assistant in search, and the transition experience to the chatbot. I needed to identify the delta between OCI's intent recognition and intelligence of the response, versus a user's expectations. We also wanted to understand where the most fitting location for in-depth answers belonged compared to quick results, and how users mentally distinguished and behaved depending on the two.

My role as a researcher on this project was to analyze the effectiveness of our new AI-in-search component and strategy. To do this I defined our open questions into objective and measurable research questions. Then I created a usability test which would expose those questions. I recruited four external users through UserZoom and conducted a moderated usability test to gauge usability and gather insights.

AI-integrated
Search

Screenshot of a search panel showing available OCI services and a Ask Oracle gen AI option at the bottom.

Testing the Oracle AI assistant integration in the global search bar.

Details

Goal: Rate the overall usability of the new integration of AI into search and the possible transition to a chatbot. In particular analyze the location, timing, and purpose of search interactions. Would users notice and utilize the AI assistant at all (available through a "Ask Oracle" button in the search panel)? If they used it, would it generate responses in a sufficient time frame? Would the information be relevant to them, or would they expect a different kind of content from search?

Challenges: Our AI responses within search were anticipated to be slow, and generated a lot of frustration during the tests. Additionally, since the test was conducted using a fully-rigged demo environment, participants would sometimes rabbit hole into irrelevant areas or get distracted by the console, which required some guidance and took up time.

Deliverables: Moderated usability test, report, presentation, and documentation of discovered issues and next steps

1. Overview & Research Ask

On this team I worked with two designers and two product managers. By collaborating with the product managers I defined the research questions to the following:

The current way to recognize intent, or identify a natural language search as such, is counting spaces in the search dropdown. Is this sufficient?
How fast does the GenAI response need to be when we’re responding to the user prompt?
Do users understand how to “continue the conversation” when moving from search to chat, and does this interaction feel good/natural to transition them over to the chat bot?
Do users need a predictable user experience or is it okay to change the experience (between search and chatbot) based on the intent we detect?
What is the threshold of quality for GenAI responses, for users to determine if the answer is useful?
Do the two entry points make sense, search and chat bot?
If someone types in the same query in search box and the chat bot, do they expect the same response? E.g. what if they type the name of a resource in both?
What results do users want to see when they’re talking to the bot or search - Are users expecting answers or are they expecting the bot to execute things on their behalf?

To make sure the product would be in line with what the market expected, I explored how competitors handled the problem. Azure, Amazon Web Services, and Google Cloud's Computing Services offered a similar service. However no feature set, including Oracle's, was entirely alike. The competitor analysis also helped identify consistent terminology and option grouping.

2. Research Methodology

To address the questions above, I drafted a usability test for users to interact with the new search functionality. The test would use our demo environment to reflect the real life retrieval time of an answer.

This moderated test comprised of 4 separate one-hour sessions through UserZoom with external users where they completed a set of tasks. Participants were paid via Amazon gift cards through our UserZoom agreement. These users were screened for general cloud experience with varying years, to capture as wide as possible of a range of behaviors. The product managers and designers sometimes observed in silence. After each task, the participants were asked to rate the ease or difficulty of what they just did from one to five, one being easy and five being difficult. This quantifying of the usability test allowed me to set a standard of usability and pinpoint which areas needed the most attention. The tasks presented to the participants were:

Enter a query in the search box, and evaluate the quality and timing of the response.
Enter a unique resource ID (provided for them) in the search box, and evaluate the response & location.
Evaluate the transition from search to the chatbot.
Provide general feedback on your experience and expectations.

Screenshot of a search panel spinning while generating an GenAI summary.

The panel that participants saw during the search interaction.

Screenshot of a search dropdown displaying text explaining the AI assistant in the panel.

The resulting answer from the AI assistant.

3. Findings

I synthesized the results of all four tests into a comprehensive report. My top-level findings were the following:

Our current intent recognition method is sufficient, and most users search for a keyword.
Our AI response is too slow. Most users would wait an average of 5 seconds.
The AI in search is not intelligent enough. It is lacking any actionable material, feels generic and not as helpful as it could be. The generative AI summary needs better formatting, links, and actionable steps.
The AI in chatbot is sufficient but could be improved with OCI-specific actions and responses.

I collected these insights into a generalized presentation and more thorough document, which listed each issue and it's relevant suggestion, provided either by participants or myself. One of the most relevant quotes I heard from this series of tests was this: "Don't tell me generic answers, start doing the work for me. Google Home doesn't tell me how to check my internet, it's doing it for me."

Another surprising finding was that users would sometimes search non-Oracle sources like ChatGPT for answers about OCI, either because they were more familiar with those tools or trusted those sources more. This was a startling result that indicated the documentation team or the chatbot RAG had a lot of work ahead of them to improve user trust.

Screenshot of the GenAI results. The text reads: All users saw the Oracle Assistant button and correctly assumed this would continue the conversation. None of the users wanted to be moved there automatically - they want to click into the Assistant to engage with it. When in the chatbot, participants still want to keep their current page under the chat, but carry over the content of their search/question. It is alright to change the experience between the search and the dropdown based on the intent we detect, but the core answer should be the same. What can change is the level or depth of information - all users expect direct links and information from the search, and more thorough results, additional information, and automated tasks from the chatbot.

Example slide of the research findings regarding transition from search to chat.

Screenshot of the GenAI findings. The slide reads: "Current quality is drastically insufficient, lacking of any actionable material, feels generic and not as helpful as it could be. The generative AI summary needs more links and actionable steps, referencing direct OCI resources from the customer's own infrastructure.
However, participants did feel this was an improvement over having to skim many articles on their own. " A quote says "This feels like a ChatGPT answer.”. One participant skipped the Open Assistant button entirely because they did not want to "ask for help”.

Example slide of the research findings regarding chat response quality.

4. Reviews, Results, & Takeaways

The findings were presented to the project team and the organization at large, as various teams owned individual components of the project. The UX group discussed specific issues and next steps, leading to several insights that significantly improved the GenAI search feature. Users reported that the AI-generated summary took too long to load, prompting the AI team to consider solutions such as caching answers to frequent questions, speeding up backend response times, or providing procedural results. Another key finding was that responses were too generic and not sufficiently tailored to each user, which led the team to promote runbooks and quick start suggestions. Feedback also highlighted UI issues, including the visibility of the Ask Oracle icon and concerns about losing chat history when closing the chatbot. These issues were addressed through updates to the design and text. While some insights require more complex resolutions than simple text changes, we have identified and are striving toward a robust and user-friendly AI that users will find useful and impactful.

Takeaways

This project was deeply interesting to me because of how much it exposed about user behaviors around search itself, then AI-assisted search. I learned a lot of insights that can be broadly applicable to other projects, from the timing question, to the expectations around search behavior and how those changed depending on the participant's years of experience: less-experienced users anticipated the search box to function and respond to conversational queries, where those with more experience anticipated a keyword prompt only. The project team also appreciated sitting in the usability studies, as they saw the changes they'd need to make in real time and therefore trusted the findings I provided.