Voice Alias

From EuroScopeWiki

Revision as of 22:24, 21 February 2012; view current revision
←Older revision | Newer revision→
Jump to: navigation, search
Previous: Voice ATIS Actual: Voice Alias Next: Connection Features

Contents

Voice Aliases

This section explains an experimental feature that lets you control text-only aircraft using voice recognition in EuroScope.

Since the introduction of voice communication in the VATSIM network, it has always been a problem to simultaneously control text-only and voice-capable aircraft. Techniques like text alias definitions and the automatic text message generators of EuroScope have alleviated the problem somewhat. But a text-only aircraft still needs more time to communicate with. And probably the most difficult thing is that other planes are not aware that you are typing a text message while the radio channel is silent and use that chance to relay their message - of course expecting an immediate reply.

The technology to solve that dilemma is around since a long time: Voice recognition. With a good voice recognition system you could simply talk to text-only aircraft on you voice channel and have your message tranlated to text and then sent to the aircraft with a single button. Unfortunately, as I found, the current speech recognition technology leaves much to desire when it comes to the precise recognition of controller messages; at least that is true for Microsoft's speech enging. It is able to recognize many words, but there are too many mistakes and misinterpretations. While I tested, I have never been able to recognize more than 70-80% of all words correctly. That is far less then what is required for our purposes. I also found that there are some words (e.g. zero) that I was simply unable to teach-in. That of course would preclude the use in our application.

To achieve a higher level of recognition we have to reduce the set of words and sentences that the system must be able to recognize. If we do this, we can dramatically increase the chance of a correct recognition, even if some words have been recognized uncorrectly. I therefore created a simple grammar (with regular expressions). You can define words, alternative lists, repeat blocks, sequences and (probably the most usefull) sound association statements. The set of legal sentences that can be formed by these rules are the language that we restrict the recognition process to. EuroScope will try to understand only these sentences and nothing else. In many cases that reduces the number of choices significantly, and hence EuroScope can figure the correct phrase guided by the context.

The Grammar File Structure

As nearly everything in the EuroScope environment, the grammar files are also simply text files. You can edit them with Notepad or any other text editor. Every line is compiled separately and contains a single rule.

Word Rule

The basic elements of the grammer syntax are the words. You have to define all words you would like to use in your aliases. It is also a help for the voice recognition engine that only these words are to be recognized. No word that is not defined here will be ever recognized.

There are two methods to define a word. The first is the simple one:

WORD:approach
WORD:runway
WORD:squawk
WORD:land
WORD:takeoff
WORD:taxi

These lines represent simple word definitions.

The second method contains in addition a replacement rule:

WORD:zero:0
WORD:one:1
WORD:two:2
WORD:alpha:A
WORD:bravo:B
WORD:charlie:C
WORD:thousand:000
WORD:hundred:00

In the above examples, if the engine recognizes the word, it will enter the replacement string to the message. Note that when using a simple rule, EuroScope will automatically surround the recognized word by spaces, while for a replacemen rule, the replacement string is added without adding spaces. You might specify spaces in your replacement string, though. Multiple spaces that result from neighboring words are being reduced to single spaces later.

Here are some examples how EuroScope will convert voice inputs according to the rules above:

  • "victor echo bravo oscar sierra" --> "VEBOS"
  • "squawk two six two two" --> "squawk 2622"
  • "one thousand" -->"1000"
  • "five hundred" -->"500"
  • "seven thousand five hundred" --> "7500"

The last example demonstrates a special built-in rule that applies when the word ""hundred" is recognized. It then suppresses the "000" that has been added by an immediately preceding "thousand" rule.

I also found that I was unable to make the word "proceed" to be recognized by the engine. So I added the word "direct" to be replaced by "proceed direct". Just a little trick to cope with the impossible.

WORD:direct: proceed direct :

Sounds-like Rule

When experimenting, I found that there were some words that I was simply not able to produce by the engine. When I spoke them, the engine found always another word from the list. To be able to cope with these regular recognition failures, I added the sounds-like statement to the grammer syntax. If you find that when pronouncing word xxx the engine will always recognize word yyy, you can solve that problem by adding a rule:

SOUNDS:xxx:yyy

That tells EuroScope: Whenever the speech engine delivers yyy, it may be replaced by xxx if the other syntax rules would allow that. Both xxx and yyy must be words that have been defined before.

There are very similarly sounding words, like:

SOUNDS:four:for
SOUNDS:two:to
SOUNDS:descend:descent
SOUNDS:descent:descend

These cannot be recognized without the sounds rules at all. Then there might be words that are specific to a particular speaker and which require some tweeking with sounds rules, for example:

SOUNDS:via:zero
SOUNDS:victor:zero
SOUNDS:gate:eight

Just experiment with the recognition engine and if you always receive another word than spoken, add a sounds rule.

Regular Expressions

The word rules we have seen so far define the core elements of our syntax. From these core elements we can form expressions by using one of the rules explained below. For all those expression-forming rules apply two principals:

  • In an expression, only words and elements can be used that have been previously defined. So there is no way to define a recursion (that avoids some endless loop).
  • All names must be globally unique (you can not have the same name for two different expressions or words).

Alternative (One-of) Rule

Using this statement you can define an element that can assume exactly one of the listed alternatives. The first argument is the name of the expression while the remaining arguments are the permitted alternatives. Examples:

ONEOF:number:zero:one:two:three:four:five:six:seven:eight:nine

Here I defined the "number" expression that can be any of the alternatives "one", "two" … "nine".

ONEOF:direction:left:right

When giving a turn instruction, we can add the "direction" expression and then it can assume only the values "left" or "right".

Repeat Rule

When you need the same type of element zero, one or more times then you can express that in a repeat expression. You can use it also for optional elements with minimum 0 and maximum 1 occurrences. The first argument is the name of the expression, followed by the argument that specifies the name of what can be repeated, followed by the minimum and the maximum occurrences.

REPEAT:two_numbers:number:2:2
REPEAT:three_numbers:number:3:3
REPEAT:four_numbers:number:4:4
REPEAT:optional_direction:direction:0:1

The expression "three_number" in the second example is useful to define a heading, the "four_numbers" is good for specifying a squawk code. The "optional_direction" can be used as optional runway designator.

Sequence Rule

To concatenate several expressions you can use the sequence rule. It is as simple as it looks. The first argument is the name of the expression, while the rest are the expressions to be concatenated.

SEQUENCE:qnh_data:by:QNH:four_numbers
SEQUENCE:badov:bravo:alpha:delta:oscar:victor
SEQUENCE:wind_normal:wind:three_numbers:at:two_numbers:knots:optional_gusting

Sentence Rule

Sentence rules are similar to sequence rules, except that they form the top-level of the syntax tree and that they have no name argument. Examples:

SENTENCE:descend:altitude:number:thousand:optional_hundred:feet:optional_qnh
SENTENCE:descend:flight:level:three_numbers
SENTENCE:climb:altitude:number:thousand:optional_hundred:feet:optional_qnh
SENTENCE:climb:flight:level:three_numbers
SENTENCE:cleared:for:ILS:approach:rwy_name
SENTENCE:cleared:for:takeoff:rwy_name:wind_data
SENTENCE:cleared:to:land:rwy_name:wind_data
SENTENCE:good:evening:identified
SENTENCE:good:evening:squawk:four_numbers
SENTENCE:turn:direction:heading:three_numbers:optional_base
SENTENCE:descend:to:reach:flight:level:three_numbers:by:waypoint
SENTENCE:direct:waypoint

That means, EuroScope always tries to recognize your input as one of the specified sentence rules. After each word that you have spoken the alias matcher gets all the so far spoken words and selects the sentence rule that matches best.

In other words, the sentence rules are actually your voice aliases.

Usage

Microsoft Speech Recognition Engine

First of all you need to install the speech recognition engine. If you are using Windows XP, download the Speech SDK 5.1 from http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&displaylang=en and install. You need the SpeechSDK51.exe (ca. 70 MB). If you are using Windows Vista or Windows 7, the integrated speech recognition should be sufficient.

EuroScope Setup

In EuroScope open the voice setup dialog. In the bottom you find the field Voice alias grammar. Enter your grammar file name here and check the Enable voice alias recognition check box.

When loading the grammer file, EuroScope performs a syntax check. So you may see the message dialog pop up with a listing all the problems in your grammar file. Fix and reload it again. You do not have to leave EuroScope to change the grammar file.

If the grammar file is okay, you can press the Test grammar button and start talking words immediately. Say all the words in your grammer file and test if the engine recognizes them well. Where are the problematic words? Try adding sounds statements to make it more stable. And of course test all your sentences again and again. Be patient. To build a good grammar file needs some time and practice. You can test your grammar even without speaking. Just edit the content of the source edit box (left box) to see what sentence would be recognized (see the next section for more details).

Using Voice Aliases

When voice alias recognition is enabled you can use it in the following way:

  • Press the primary PTT button. You can now speak the callsign of the text-only aircraft, but as long you don't press simultaneously the secondary PTT button, you will be just transmitting on your voice channel. This will probably buy you some time, as everybody will be waiting for a read-back. I do not suggest trying to recognize the callsign with the speech engine.
  • Then while the primary PTT is down, press the secondary PTT simultaneously. That will switch on the recognition engine (and will not connect the secondary devices to the microphone). When enabled, the bottom line (prompt and message line) is hidden and two new read/write edit boxes are shown there. They are both empty.
  • While you are talking, EuroScope writes all recognized words to the left edit box - just as they come from the speech engine. In the right edit box, EuroScope shows the so far best matching alias. All matched word rules are shown as is, while words coming from a sounds rule are flagged by enclosing them in braces {}, and failed matches are flagged by angle brackets [].
  • When ready, you can release the secondary PTT button. You can go and manually edit any of the edit boxes. If you change the input edit, then for every keystroke the content is analyzed for a better sentence match. You can use the replacement strings as well: For example, you can press 2 instead of two, or A instead of alpha.
  • If the result is okay, press ENTER to copy the result to the command line. Or if you press the secondary PTT again (when the result edit box is not empty) then the content or the right edit box is copied to the command editor, but you remain in speech recognition mode for processing a new sentence. This can be used to send multiple sentences to an airplane in a single message.
  • Finally, send the text message that is now in the the command line window in the normal way to the airplane.

The ESGrammar.txt File

I have packed an ESGrammar.txt file into the setup (Settings folder), which works well for me. But I guess you need to adopt the file to your personal environment.

I hope, speech recognition will bring some more fun to controlling text-only airplanes.


Previous: Voice ATIS Actual: Voice Alias Next: Connection Features
Personal tools
Tower view
FAQ