Alexa Voice-led CX

OVERVIEW

In 2023, I led design direction for a next-generation conversational shopping north-star vision across Alexa-enabled devices. Our work explored how emerging generative-AI models could make shopping more adaptive, ambient, and multimodal—bridging voice, visual, and touch interactions across surfaces.

The design vision connected the customer experience across voice-only speakers, voice/touch displays, large-screen devices, and mobile hand-offs—ensuring a coherent assistant that adapted to each modality and context.

THE CHALLENGE

Existing Alexa shopping experiences were scripted and rule-based, optimized for repeatable tasks (“Reorder paper towels” or “Buy now”). They lacked reasoning, summarization, and cross-device continuity. New large-language-model (LLM) capabilities created an opportunity to evolve toward open-ended conversations—able to infer intent, reason through options, and compose visual responses dynamically.

The challenge was to translate this technical leap into an experience that feels human, trustworthy, and beautiful, while bridging across devices.

EXISTING ALEXA SHOPPING FLOWS WERE SCRIPTED AND DETERMINISTIC

MY ROLE

Senior UX Designer

Defined the north-star multimodal framework and customer-experience vision.
Led a cross-disciplinary pod of conversation, visual, and motion designers.
Consulted with data-science and engineering on feasibility and scope.
Delivered a CX vision deck and design guide aligning leadership and product teams on AI-shopping strategy.

NEW MENTAL MODEL: FROM SCRIPTED TO GENERATED

We reimagined the Alexa assistant from a voice command processor to a shopping concierge—able to reason, summarize, and curate. The screen interface evolved into a living surface where layout, motion, and content adapt in real time to the conversation, each transition reflecting the system’s confidence and thought process.

For voice-only devices, the audio experience became more fluid and expressive, with seamless handoffs to mobile for complex, visually driven interactions.

To support this generative behavior, we designed flexible layouts and transitions that absorb variation gracefully—maintaining visual consistency and comprehension even as the system’s outputs shift dynamically.

EXPERIENCE TENETS

alexa unifies intelligence across multiple devices in different rooms

I drafted tenets to establish a shared understanding across our teams about how multimodal intelligence behave:

Lead with ambient interactions – effortless, voice-first engagement, augmented by visuals only when helpful.
Generate in real time – adaptive dialogue over static navigation.
Be efficient by default – quick paths to completion that can expand for deeper discovery.
Adapt to context – tone, pacing, and modality respond to environment and user context.
Learn and remember – short- and long-term context carryover to build trust and continuity.
Unify intelligence across devices – one consistent personality and reasoning model everywhere.

DESIGN PRINCIPLES

To guide feature design and system behavior, I codified principles connecting AI reasoning to interface expression:

Be frugal with attention – concise voice, glanceable visuals.
Reveal progressively – information unfolds like conversation.
Balance variation with consistency – dynamic, generated output within predictable forms.
Compose adaptively – layout and density flex by context and intent.
Curate transparently – personalized recommendations with explanation.
Elevate the everyday – calm, refined aesthetics suitable for the home.

DESIGN FRAMEWORK

We developed a design framework that established the groundwork for a multimodal design system composed of configurable patterns that visualize reasoning and adapt fluidly across device types. These included vertical grids for inspiration and discovery, horizontal and comparative layouts for evaluation, focused detail views for decision-making, and related-content modules that expand context and continuity.

This set of patterns formed a scalable foundation for real-time, AI-generated composition—ensuring that content remained coherent, flexible, and expressive across Alexa’s ecosystem of screens and devices.

IMPACT

Our design vision became the unifying artifact across seven principal-level product leads, each owning independent workstreams, aligning them around a shared north-star direction for multimodal shopping. It gained SVP-level endorsement and set the strategic foundation for future conversational-AI initiatives, informing follow-on work in personalization, explainability, and cross-device continuity. Beyond influencing roadmap priorities, this effort elevated design’s role from executional contributor to strategic partner in shaping AI experience quality.

SCRIPT OF TYPICAL ALEXA CONVERSATIONAL ERROR WITHOUT LLM CAPABILITY

AUDIO SIMULATION OF LLM-POWERED ALEXA

REFLECTION

Designing for LLMs means crafting expressive systems that visualize intelligence. This work incorporated motion, pacing, and multimodal design to make complex AI reasoning intuitive and human—turning nascent technology into an ambient, everyday experience.

Caroline Kim

Caroline Kim

Caroline Kim

Alexa is Amazon’s voice assistant for hands-free, ambient interaction—available across multimodal devices types. Powered by AI, Alexa shopping assistance meets customers wherever they are, helping with tasks from reordering essentials to discovering great deals.

Because of confidentiality, the following is a brief, high-level summary of design frameworks and process, omitting visuals of unreleased designs.

OVERVIEW

The design vision connected the customer experience across voice-only speakers, voice/touch displays, large-screen devices, and mobile hand-offs—ensuring a coherent assistant that adapted to each modality and context.

THE CHALLENGE

The challenge was to translate this technical leap into an experience that feels human, trustworthy, and beautiful, while bridging across devices.

Senior UX Designer

Defined the north-star multimodal framework and customer-experience vision.

Led a cross-disciplinary pod of conversation, visual, and motion designers.

Consulted with data-science and engineering on feasibility and scope.

Delivered a CX vision deck and design guide aligning leadership and product teams on AI-shopping strategy.

NEW MENTAL MODEL: FROM SCRIPTED TO GENERATED

For voice-only devices, the audio experience became more fluid and expressive, with seamless handoffs to mobile for complex, visually driven interactions.

To support this generative behavior, we designed flexible layouts and transitions that absorb variation gracefully—maintaining visual consistency and comprehension even as the system’s outputs shift dynamically.

EXPERIENCE TENETS

I drafted tenets to establish a shared understanding across our teams about how multimodal intelligence behave:

Lead with ambient interactions – effortless, voice-first engagement, augmented by visuals only when helpful.

Generate in real time – adaptive dialogue over static navigation.

Be efficient by default – quick paths to completion that can expand for deeper discovery.

Adapt to context – tone, pacing, and modality respond to environment and user context.

Learn and remember – short- and long-term context carryover to build trust and continuity.

Unify intelligence across devices – one consistent personality and reasoning model everywhere.

DESIGN PRINCIPLES

To guide feature design and system behavior, I codified principles connecting AI reasoning to interface expression:

Be frugal with attention – concise voice, glanceable visuals.

Reveal progressively – information unfolds like conversation.

Balance variation with consistency – dynamic, generated output within predictable forms.

Compose adaptively – layout and density flex by context and intent.

Curate transparently – personalized recommendations with explanation.

Elevate the everyday – calm, refined aesthetics suitable for the home.

DESIGN FRAMEWORK

This set of patterns formed a scalable foundation for real-time, AI-generated composition—ensuring that content remained coherent, flexible, and expressive across Alexa’s ecosystem of screens and devices.

IMPACT

REFLECTION

Designing for LLMs means crafting expressive systems that visualize intelligence. This work incorporated motion, pacing, and multimodal design to make complex AI reasoning intuitive and human—turning nascent technology into an ambient, everyday experience.