This project centered on analyzing the integration of an AI assistant in search, and the transition experience to the chatbot. I needed to identify the delta between OCI's intent recognition and intelligence of the response, versus a user's expectations. We also wanted to understand where the most fitting location for in-depth answers belonged compared to quick results, and how users mentally distinguished and behaved depending on the two.
My role as a researcher on this project was to analyze the effectiveness of our new AI-in-search component and strategy. To do this I defined our open questions into objective and measurable research questions. Then I created a usability test which would expose those questions. I recruited four external users through UserZoom and conducted a moderated usability test to gauge usability and gather insights.
Goal: Rate the overall usability of the new integration of AI into search and the possible transition to a chatbot. In particular analyze the location, timing, and purpose of search interactions. Would users notice and utilize the AI assistant at all (available through a "Ask Oracle" button in the search panel)? If they used it, would it generate responses in a sufficient time frame? Would the information be relevant to them, or would they expect a different kind of content from search?
Challenges: Our AI responses within search were anticipated to be slow, and generated a lot of frustration during the tests. Additionally, since the test was conducted using a fully-rigged demo environment, participants would sometimes rabbit hole into irrelevant areas or get distracted by the console, which required some guidance and took up time.
Deliverables: Moderated usability test, report, presentation, and documentation of discovered issues and next steps
On this team I worked with two designers and two product managers. By collaborating with the product managers I defined the research questions to the following:
To make sure the product would be in line with what the market expected, I explored how competitors handled the problem. Azure, Amazon Web Services, and Google Cloud's Computing Services offered a similar service. However no feature set, including Oracle's, was entirely alike. The competitor analysis also helped identify consistent terminology and option grouping.
To address the questions above, I drafted a usability test for users to interact with the new search functionality. The test would use our demo environment to reflect the real life retrieval time of an answer.
This moderated test comprised of 4 separate one-hour sessions through UserZoom with external users where they completed a set of tasks. Participants were paid via Amazon gift cards through our UserZoom agreement. These users were screened for general cloud experience with varying years, to capture as wide as possible of a range of behaviors. The product managers and designers sometimes observed in silence. After each task, the participants were asked to rate the ease or difficulty of what they just did from one to five, one being easy and five being difficult. This quantifying of the usability test allowed me to set a standard of usability and pinpoint which areas needed the most attention. The tasks presented to the participants were:
I synthesized the results of all four tests into a comprehensive report. My top-level findings were the following:
I collected these insights into a generalized presentation and more thorough document, which listed each issue and it's relevant suggestion, provided either by participants or myself. One of the most relevant quotes I heard from this series of tests was this: "Don't tell me generic answers, start doing the work for me. Google Home doesn't tell me how to check my internet, it's doing it for me."
Another surprising finding was that users would sometimes search non-Oracle sources like ChatGPT for answers about OCI, either because they were more familiar with those tools or trusted those sources more. This was a startling result that indicated the documentation team or the chatbot RAG had a lot of work ahead of them to improve user trust.
The findings were presented to the project team and the organization at large, as various teams owned individual components of the project. The UX group discussed specific issues and next steps, leading to several insights that significantly improved the GenAI search feature. Users reported that the AI-generated summary took too long to load, prompting the AI team to consider solutions such as caching answers to frequent questions, speeding up backend response times, or providing procedural results. Another key finding was that responses were too generic and not sufficiently tailored to each user, which led the team to promote runbooks and quick start suggestions. Feedback also highlighted UI issues, including the visibility of the Ask Oracle icon and concerns about losing chat history when closing the chatbot. These issues were addressed through updates to the design and text. While some insights require more complex resolutions than simple text changes, we have identified and are striving toward a robust and user-friendly AI that users will find useful and impactful.
This project was deeply interesting to me because of how much it exposed about user behaviors around search itself, then AI-assisted search. I learned a lot of insights that can be broadly applicable to other projects, from the timing question, to the expectations around search behavior and how those changed depending on the participant's years of experience: less-experienced users anticipated the search box to function and respond to conversational queries, where those with more experience anticipated a keyword prompt only. The project team also appreciated sitting in the usability studies, as they saw the changes they'd need to make in real time and therefore trusted the findings I provided.