At this year’s Microsoft Ignite conference, a surprising unveiling has taken center stage: a tool capable of crafting photorealistic avatars that can convincingly articulate scripted dialogue, even if the original speaker never uttered those words.
Named the Azure AI Speech Text-to-Speech Avatar, this new feature, now available in public preview, empowers users to generate videos featuring an avatar speaking. By uploading images resembling the desired person and scripting dialogue, Microsoft’s tool orchestrates animation through a trained model. Simultaneously, a separate text-to-speech model, either prebuilt or trained on the person’s voice, brings the script to life.
Microsoft describes this innovation as a means to streamline video creation, catering to diverse purposes like crafting training videos, product introductions, customer testimonials, and more, all from simple text inputs. Moreover, the avatar can serve as a foundation for conversational agents, virtual assistants, and chatbots.
Beyond language versatility, these avatars possess the capability, in chatbot scenarios, to harness AI models such as OpenAI’s GPT-3.5, enabling responses to unscripted queries from customers.
However, amidst its potential, Microsoft acknowledges the tool’s misuse potential, citing instances where similar tech has been exploited for propaganda and false news. To mitigate misuse, Azure subscribers, for now, will access prebuilt avatars exclusively, with custom avatars limited to specific use cases and accessible via registration only.
Yet, ethical dilemmas loom large. The recent SAG-AFTRA strike highlighted concerns about AI-generated digital likenesses and fair compensation for actors. Will Microsoft and its clientele confront similar ethical considerations?
Questions linger about companies utilizing actors’ likenesses without adequate compensation or acknowledgment. Despite inquiries, Microsoft has remained silent on this matter and the possibility of requiring AI-generated avatars to be labeled as such, akin to trends on platforms like YouTube.
In tandem with avatar creation, Microsoft introduces “Personal Voice,” a facet within its custom neural voice service. This tool crafts personalized voices within seconds, provided a one-minute audio prompt, facilitating personalized voice assistants, multilingual content dubbing, and bespoke narrations for various media.
To preempt legal entanglements, Microsoft mandates “explicit consent” via a recorded statement before users leverage Personal Voice. Access to this feature remains restricted, and usage limitations prevent the voice from reading user-generated or open-ended content.
However, details on compensating actors for their voice contributions or implementing technologies to distinguish AI-generated voices remain elusive, as Microsoft declined to address these queries posed by TechCrunch.
The emergence of these groundbreaking tools marks a transformative juncture in digital interaction. Yet, ethical considerations and the potential impact on various industries warrant comprehensive discussions and proactive measures to ensure responsible and equitable utilization.