Testing Methods: Live Audio-only

Note: The creation of this article on testing live audio-only was human-based, with the assistance on artificial intelligence.

Explanation of the success criteria

WCAG 1.2.9 Audio-only (Live) is a Level AAA conformance level Success Criterion. It states that an alternative for time-based media that presents equivalent information for live audio-only content is provided.

This success criterion aims to make live audio-only content, such as web-based audio conferencing, live speeches, and radio webcasts, accessible through a text alternative. A live text caption service, typically operated by a trained human who transcribes spoken words with minimal delay, provides real-time captions for people who are deaf or hard of hearing. These services also include notes on non-spoken audio crucial for understanding the event.

A transcript may work if the live audio follows a set script, but a live caption service is preferred as it matches the audio pace and can adjust to any deviations from the script.

This success criterion applies to audio broadcasts, not to two-way audio calls in web apps. Captioning responsibility lies with content providers or the host caller, not the application.

Note that this Success Criterion is at a conformance level of AAA. This means that this Success Criterion is generally considered aspirational, going beyond the standard A & AA conformance levels. It addresses more specific accessibility needs and is not mandatory for all websites or content. However, achieving Level AAA can provide additional benefits in terms of inclusivity.

Who does this benefit?

People who are deaf, with limited or no hearing, can access information in the audio content.
People who are deaf-blind, with limited or no vision and hearing, can access information in the audio content.
People who cannot understand real-time audio can read an equivalent.

Testing via Automated testing

Automated-based testing for live audio content transcripts checks for live regions or ARIA roles and may detect the presence of live text containers or ARIA roles. It is fast, as it only performs structural checks, and highly scalable across many streams.

Automated testing for live audio content transcripts has several limitations. It cannot evaluate transcript accuracy, support latency for live caption delays, or fully cover spoken content. Additionally, it often results in high false positives or negatives, detecting structure but not functionality. Furthermore, it lacks an understanding of language specifics, leading to potential inaccuracies in language support.

Testing via Artificial Intelligence (AI)

AI-based testing for transcripts of live audio content can detect the presence of live caption feeds or real-time text elements, and assess the accessibility of the caption area, formatting, and structure. It can process live streams in near real-time and offers good scalability through cloud-based or live AI transcription pipelines.

AI-based testing for live audio content transcripts can compare speech to text in real time, though it may miss nuances. While some AI models can estimate caption delay based on voice-to-text alignment, they may struggle with detecting skipped segments and understanding context fully. The accuracy of AI in identifying false positives or negatives depends on the quality of the stream, layout, and live text. Additionally, AI may face challenges with less-common languages, dialects, or speech variations.

Testing via Manual testing

Manual-based testing for live audio transcripts ensures the availability of live captioning, checks real-time accuracy and completeness of the transcript, and measures any delay in captioning. It assesses whether all spoken content is captured accurately and verifies the technical implementation for accessibility, such as readability and contrast. The approach also minimizes false positives and negatives through direct observation and allows for verification of accuracy in multiple languages.

Manual-based testing for transcripts of live audio content requires full live monitoring, which slows the process. It also lacks scalability, as human observers are needed for each stream, limiting efficiency.

Which approach is best?

No single approach for testing live audio-only alternatives is perfect. However, using the strengths of each approach in combination can have a positive effect.

Automated testing can detect missing text alternatives, although it is limited to structural detection. AI-based testing can be useful for monitoring live transcription quality and timing. However, manual testing is still ideal for testing the accuracy and completeness of live text during audio-only events.