
President Macron said he would respond to nuclearįrance will not strike at Ukraine. Military responses that the entire Russian army would NATO over Russia will instead be so powerful What if we simply pass each line on its own to Google Translate? We get the following: 7īorel diplomacy, said that a nuclear attackĪgainst Ukraine will not cause a retaliatory nuclear strike Let's use this Russia 1 episode of 60 Minutes as an example: How then to preserve STT precision timestamps when using Google Translate? In contrast, onscreen captioning lines are typically around 50 characters or less, while the Visual Explorer operates at a native 4 second resolution. This means that sentences can be extremely long, sometimes spanning 30 seconds or more. Unfortunately, punctuation in transcribed speech is an artificial construct, added by machines or humans to make transcripts more readable, but not a native part of the stream of consciousness nature of the spoken word. One simple approach is to split the transcript into sentences and translate a sentence at a time, collapsing timestamp resolution to the level of sentence start and stop points. Why then do television news broadcasts pose a unique challenge? The answer is that like most NMT systems, the Translate API is designed for ordinary textual documents, whereas in broadcast news, each word in the source language is associated with a precise subsecond timestamp when it was spoken, which must be carried through to the translated transcript, with the added complication that there is rarely a one-to-one correspondence between source and target language. Google Translate's batch translation support makes it trivial to translate large text files from one language into another. What would it look like to automatically translate these transcribes ourselves using Google Translate? The Visual Explorer already displays these transcripts inline as part of its interface and Chrome browser users are able to have them automatically translated into English on-the-fly using Chrome's built-in Google Translate integration. To date we have transcribed more than 42,000 Russian and Ukrainian language television news broadcasts through Google's Speech-to-Text API, with the resulting 2.1GB of spoken word transcripts made available for researchers and journalists to explore how Russia's invasion of Ukraine has been communicated, especially to the Russian public.
