Communication for Neural Signal Control


There are approximately 500,000 thousand people worldwide suffering from an affliction called locked-in syndrome. People with locked-in syndrome are characterized by either total or near-total immobility, an inability to speak, and yet have full cognitive abilities. These people are prisoners in their own bodies, unable to move or speak but able to think quite clearly. [1]

 

For most of us, speaking is the primary mode of communicating with others. Not only do we use verbal speech, we engage in complex communication activities through social conversational protocols that also include eye gaze, head movement, and hand gesture. [1a] The use of these protocols is not something that we notice because it is mostly automatic and unconscious. People with locked-in syndrome are quite aware of these protocols because they have none of these capabilities. Unlike able-bodied individuals, their need to communicate with others can also often be critical to their physical safety and comfort.

This report is a portion of research that is in progress. The aim is to restore both communicative and environmental control abilities to such people. The idea is that if the locked-in community can communicate and interact through an intermediary computing device, some of the capabilities that they now lack can be regained. Since these patients have no way of physically controlling a computer, the core component of the research is a device, invented by Dr. Philip Kennedy Neural Signals, Inc., which allows patients to mentally control a computer. Currently, locked-in people are able to communicate with a computer by producing neural signals from the motor cortex region of their brain.

The neural signal communication is slow and taxing. In order to help the communication increase in speed and decrease in effort, we need to build an application that maximizes strategies that make it possible for the patient to communicate in an effective way. Augmentative and Alternative Communication  (AAC) techniques can be a way to achieve this goal. [2a] The techniques must do their best to replace the capabilities that the patients are lacking, which are control of verbal speech, eye gaze, head movement and hand gesture. In order to do this successfully, many aspects of the situation must be assessed.  We need to understand the composition of our design space, existing methods of AAC, and existing designs and/or research that deal with this problem. Each of these aspects is detailed.  Once we understand those aspects, different strategies for possible designs are outlined. Finally, the current plan is detailed based on the evaluations of the various strategies considered.

 

 

User Model


This user population has specific limitations that must be considered during the design and implementation of any application. Neural control with current applications has shown a low selection accuracy, low cursor precision, slow speed, and high error rate. Also, there is a high cognitive workload for every task, which creates fatigue.  There are two current models for navigation for this population.

The first model has two signals. The device driver is programmed to activate a switch when the frequency of the signal produced reaches a certain threshold. In this scenario, two electrodes are necessary. Two-dimensional navigation with suppression will move the cursor horizontally (right) or vertically (down) and suppression would act as click of a mouse. Because the cursor only moves right and down, screen wrapping and top left start are implemented.

The second model has four signals. There is nudge down, nudge across, shove down, and shove across. Suppression and logical control are also aspects of this model. Currently, a web browser application is designed with this model.

 

Existing Methods


Navigation methods

1.     2-D

Two-dimensional navigation is simply the horizontal and vertical movement around the screen. The movement is pixel to pixel. Currently, the virtual keyboard has this navigation. It is basic and familiar.

 

2.     Logical control

Logical control is different from two-dimensional in that is jumps from item to item in a GUI with every input of movement, not pixel to pixel. The movement is however two-dimensional.

 

3.     Scanning

The idea behind scanning is that elements (such as letters from the alphabet) are arranged in a fixed fashion, such as a grid. A cursor or highlighter moves through each of the elements at a fixed rate of speed. When the intended element is selected, the user activates a switch (or clicks a mouse button, etc.) to indicate selection of that element [2]. Most scanning methods also include delete element key(s) in order to respond to errors made [6]. Most arrangements of elements are in a row-column format, where the cursor first selects an entire row of elements and proceeds to highlight the next rows until a switch is activated. Once a row is chosen, it scans each item in the row (in a column direction)  until the user selects an item. [2, 6]. However, two main ideas have emerged from this concept: character-based and word-based input.

The character-based approach is a configuration where each character of the alphabet plus other elements such as a blank space (to separate words), punctuation marks, backspace, and possibly backward (i.e., remove the entire previous word) are arranged in a rectangular grid. The arrangements can vary depending on the user [2]. While an alphabetic arrangement may allow for easy understanding for the user, the communication rate may not be optimal compared to a frequency-based approach. Specifically, if the keys are arranged so that the most frequently used characters/keys are placed closer to the initial starting point of the cursor, a significant increase may be seen.

The word-based approach is a configuration where a grid has two layers. On the first layer, elements are subject headings. On the subsequent layer, the elements are words relating to the subject headings. In other words, the elements of the grid are a predefined vocabulary set, in which every word is grouped into a category. To select a word, the user first selects the subject heading, then the actual word contained in the subject, necessitating the use of four clicks, rather than two as in the character-based approach. [6] The word-based approach is twice as deep as the character-based approach.

As a special note for both of these methods, the research in [2] focused on reducing timing errors for people using scanning systems. Brewster showed that when audio output accompanies the movements of the cursor, the user is far less likely to make an error in timing. Since humans have a more natural association of timing with sound rather than movement, this conclusion makes sense. Some caution should be taken regarding this result, however, as it was based on a sample size of one. In any case, the idea is intriguing and should be considered for any implementation in our research.

There are advantages and disadvantages to each of these approaches. The character-based method allows for minimal device manipulation (or, in our case, minimal thought) in order to select an element. Moreover, the vocabulary for the user is unlimited, whereas in the word-based approach, the vocabulary is finite. While the word-based method requires double the amount of switches per element, the element itself can be much richer than an element of the character-based technique. Moreover, in the word-based approach, words that commonly form a sentence occur in separate categories, thus heavily increasing the cognitive load (which itself is not dedicated to the conversation). In other words the user must think on two different levels in order to converse.

Another scanning possibility is a binary arrangement, as mentioned in [1]. Here, we have the letters of the alphabet arranged in a balanced binary tree. A letter is displayed on the screen (starting with M or N), and the cursor scans through choices of "earlier" or "later" (or "select" when the proper letter is displayed), thus efficiently navigating the tree. This may help the speed at which letters are selected; yet conversation would still occur on two different levels of thought.

Put graphic here

 

4.     Wrapping

This allows the user to wrap around the screen. Currently, the user population has challenges with the control of the cursor. They are not able to go backward so the cursor wraps around from the far right back to the right left of the screen, so they can start again.

 

Communication methods

1.     Spelling

 

Morse

Words+ includes this feature. The strategy is to minimize the input required to form a word by entering it in Morse code.

 

T9

Most telephone interfaces use this strategy to enter letters into a system to search a directory of users. There are nine numbers on a telephone, not including the zero. The one is not used and the rest of the numbers have three letters associated with them. If a user pressed the number two, this would represent a choice of A, B, or C. The user would then either needs to choose one of the three letters or a predictive element would need to help choose. We could use this idea and rearrange the organization to suit the user population.

 

Virtual Keyboard

The current keyboard used for neural control is the WiVik. The WiVik is a virtual keyboard. It is a GUI interface of a grid of letters, comma, period, and backspace. Currently, the user navigates to each letter with a two dimensional navigation and presses the appropriate buttons. This is slow (5 words/minute)  and there are many errors made but the level of control is high.

 

2.     Grammatical

Compansion is a technique that sits between the word prediction and the semantic compaction approaches to communication. Formally, the concept regarding compansion is to translate a sequence of content words into a grammatically correct and relevant sentence. In other words, when the user enters the phrase "apple eat john", a compansion method would produce the output "John has eaten an apple" but never "The apple ate John." [5] Alternatively, if the input content words could be interpreted various ways, the user is presented with a list of choices from which to select. [6]

When combined with word prediction, compansion clearly creates an even faster method for producing output. However, compansion can also be combined with Minspeak™. While a standard user would enter the proper icon sequences to generate a functional sentence and then enter a different mode to correct all the tenses and other grammatical aspects, compansion can automatically make these changes. So here too we see a dramatic increase in efficiency. Furthermore, compansion also includes a concept of "icon prediction." The described method does not however predict sequences of icons. Rather, when an icon is entered, all invalid subsequent icons are disallowed from input. So the user still must type every icon, but the chances for error are reduced to those of meaning. [5]

There is also one more advantage that is relevant to our research. Out population are cognitively intact, and so will likely have a keen grasp of English grammar. The design of compansion specifically targets just the people whom we are aiming to help. [6]

 

3.     Iconic

Whether denoted semantic compaction (scientifically) or Minspeak™ (commercially), this concept uses a fairly radical approach to limited input communication. Developed by Bruce Baker around 1980, the idea is to give the user a keyboard consisting only of a small quantity of icons (32, 64, or 128). These icons are ambiguous in that they represent concepts only, and each icon represents more than one concept. However, when the icons are placed in a logical sequence, the context of these icons generates unambiguous ideas. [4, 7]

Take for example the icon APPLE. Meanings associated with this icon can be food, eat, red or even New York (i.e. the Big Apple). Typing this icon will generate no particular output. But when combined with the RAINBOW icon, the unambiguous meaning of red is extracted from the sequence. In this manner a small number of icons, when combined, form a large set of language elements. [7]

Baker claims that semantic compaction systematizes a natural process through which we all pass during the course of conversation. In other words, although one is searching for the icons to represent his or her thoughts, this is not a separate level of cognition (as compared to word prediction). Moreover, whereas abbreviations in abbreviation expansion become arbitrary for large quantities, icons make intuitive sense to the "speaker," and are much more easily remembered. Minspeak™ keyboards also contain alphanumeric keys so that users can generate unique words or add noun and verb endings as appropriate. With this in mind, Baker claims that the number of keystrokes pressed (relative to standard keyboard typing) can drop as much as 60%. [7]

There are some disadvantages to semantic compaction, however. Clearly, users must remember a large amount of sequences in order to effectively use the Minspeak™ system (see the next section about "icon prediction" though) [4, 6]. Moreover, the output provided by semantic compaction is usually functional, i.e., allows the user to make a point but may not be grammatically correct. Minspeak™ does provide keys to fix this, but then the communication rate is greatly slowed and the benefit of the system diminishes or vanishes [6]. Since we desire to have full-scale communication, with no adverse effects for our population, this poses a problem. There is also a time consideration: at least 90 hours of training and practice are recommended to master a simple 2000 word vocabulary. [7]

As a final note, Minspeak™ systems are licensed exclusively to the Prentke Romich Company. They provide both physical and virtual models of various Minspeak™ systems. In addition, PRC provides built-in vocabularies called MAPs. This frees the user from defining his or her own icon sequences and allows for practice to begin immediately. Both PRC and Baker assert that continued practice allows for great increases in the effectiveness of communication. [7, 10]

Mind Express

 

 

4.     Prediction

Word

Word prediction is another method, which does not generate input but rather allows minimal input to be mapped to richer input. The concept behind this process also follows closely from the name: as a user types in the beginning of a word, the input software attempts to predict what word the user is attempting to enter. There are many ways to accomplish this, with the most typical being a list of possible matches displayed after a certain number of keystrokes. While it is clear that an alphabetical list of matches could be shown, this is often not helpful when the goal is minimal input. Thus many techniques are based on the frequency that words appear within the English language or normal English conversation. Some other processes rely on how recently a user used a word for input. [3]

While these methods have been shown to be somewhat useful, none of them consider the syntax in which words lie. The research contain in [3] proposes a new word prediction method based on syntactical analysis of English sentences. Natural language has an inherent redundancy, which can be quite useful for prediction (although, for our purposes, we must consider whether conversation follows this same pattern). Based on the chart-parsing method referenced in [3], a system has been built to reflect this idea. Moreover, this system adapts to the common language usage structure and vocabulary of the user (thus incorporating localized "frequency" and "recency" techniques). [3] The conversational application CHAT developed by Newell et al [8] uses some of these same ideas but not on a word basis but rather a conversational one.

The advantages of this system are many. Clearly as words become longer in length, word prediction aids the user in typing very little of the actual word. By definition, prediction lessens the time needed for input over a large vocabulary, given that the implementation of the method is minimally intrusive. Moreover, when the capabilities of syntax analysis are considered, the possibility of predicting entire sentences greatly lessens the effort needed to generate the input of that sentence. The disadvantage, however, is that as with word-based scanning, conversation operates at more than one level, since the user is constantly selecting from a list of words and then returning to the actual conversation. Many such systems exist but there is always a tradeoff between the cognitive load and the physical effort.

 

 

Phrase

Sentence

Abbreviation expansion, as the name implies, specifies a method for minimizing the amount of input needed to generate whole words or phrases. It is a form of sentence prediction. The idea here is that the user predefines a number of small sequences of characters to represent a longer commonly used word or group of words (such as a phrase). The software handling the input from the user automatically replaces (or "expands") the abbreviation to the defined grouping of words. [3]

The obvious advantage to this strategy is that the user is required to type very little in order to maintain a dialogue. However, there are two major disadvantages. One is one we have already encountered: abbreviations can cover only part of the actual vocabulary of the user. Long words, which are infrequently used, will still have to be fully typed [3]. Moreover, as the number of abbreviations rises, the amount of memorization needed by the user rises, and soon the abbreviations become quite arbitrary. This can lead to many errors in input, which can be very costly in aided conversation. [7]

5.     Conversation

The approach described as conversational momentum is quite different from any other already mentioned. The prediction capabilities of the system are based on well-defined structures of normal conversation. So the social rules of negotiation are ingrained into this system, with no other predictive capabilities. Some have criticized this system because speech can vary so widely. However, while speech is "infinitely variable... it does not follow... that it is totally unpredictable." [8]. The core idea of this system is based on the user "participating in a satisfying way in social encounters," [8] i.e., the exact goal of our research.

The method of this application is based on speech act prediction instead of word prediction. The CHAT system has six menus, greeting, response, smalltalk, discussion, wrap-up, and farewell, which are the speech acts. Each menu has two choices, which are say it and filler. Filler is the user responding by saying things like, “ Yeah” or “Uhhuh” to be affirmative. Fillers are an aspect of conversation that keeps the participant talking. Also, each respond has an emotional choice attached to it.  The user can respond politely or angrily.

That does not mean that the system is without limitations or disadvantages. Indeed, while conversation is predictable, the majority of the content of the conversation changes for every social interaction. In other words, while we can generate text before a conversation that will likely be used, that constitutes about 20% of the entire conversation. Also, in using entirely pre-generated text (which CHAT does), we make an assumption that the user has a lot of time to think about what conversations will be forthcoming in the near future. Also, we must assume that the user understands the conversation structure, as he or she will have to designate the current status of the conversation as it occurs. CHAT was also tested on only a small sample of people. However this research showed a possible gain up to around 50 w.p.m., which would be a great improvement over the current 5 words per minute that we are seeing. [8]

Conversational momentum nevertheless asserts a number of advantages. CHAT does have a logging ability for conversations. This means that the user can remember personal phrases or relay a certain content of a conversation to many people without having to make input every time. Additionally, half of the 80% of main conversation is spent listening. The CHAT user has at his or her fingertips a list of "filler" remarks (such as "yeah" and "uh huh") in order to reassure the other conversation participant, just as in normal conversation. There is much promise in also perhaps combining this system with the other capabilities.

6.     Context

Justine Cassell’s work

 

7.     Emotional

Newell and Cassell’s work

 

Design Space


Forming a design space is a useful process. It helps us understand all of the dimensions that we need to consider for the resulting design.  It is an outline that will change as new ideas and devices are discovered. It can also be a reference for future work. As we learn more about research that has been done, we can better combine the right elements to form a viable strategy for the design of the application.

Device

This special device, which contains a glass electrode, is implanted in the patient's motor cortex. Approximately 90 days after surgery, patients are able to generate brain signals that travel through the electrode. A receiver outside the patient's body captures these signals, and through a few other transformations, these signals are translated into mouse movements on a computer monitor. [1, 9]

Form Factor

  1. Headset – Currently, anything placed on the patient’s head falls off easily because he cannot control his head. A special headset might be an appropriate solution to receive auditory input with privacy. There is a concern regarding the aesthetic of the headset. It should not bring to much attention to his disability.
  2. Haptic glove – Some sensation in the patient’s hand has been restored. A glove that stimulates the hand might be a source of feedback. Currently, we are using a solenoid with a doorbell buzzer around a finger to send vibrational “taps” to see if this kind of feedback is effective.

 

Modalities – Input/Output

  1. Vision – This is a main source of obtaining output from the screen for the patient in the form of text and graphics. Other modes should be available. Modal redundancy can help ensure complete feedback.
  2. Audio – This is a good secondary source of output for this user population. A synthesized or natural speech feedback of text would be helpful if other primary modes are not funtioning. Also, having redundant outputs can only help him communicate better. [6a] A scaled sound to represent the dwell times start and stop or when the scanning will move to the next button would be important information. Also, adding sound effects would make using the application more fun.
  3. Haptic – The idea of the haptic glove is discussed above.
  4. Smell – There is not technology to support this input or output but should not be ignored since it is something that our population can do well.
  5. Taste – There also not technology to support this but should be considered for the future.
  6. Neural – This is the only current input for this population. The ability to control the increase in frequency of signals.

 

Technology

  1. Logical control – Currently, all applications that the user is using move the cursor from pixel to pixel. Logical control would move the cursor from item to item on the GUI. This could save the patient a lot of time. We need to be cautioned that training will be required to adapt to the change.
  2. Scanning –This has proved to be a good method for many users with disabilities. [3a] We need to assess if this is a good technique for this population.
  3. Word prediction – The method of word prediction is an important part of increasing the words per minute an individual with disabilities can input text. There is a cognitive-physical tradeoff to recognize and consider with design with word prediction methods.  Understanding how they each play a role in regard to the user will ensure a productive use of the method. [3a] [5a]
  4. Compansion – There are appropriate places for this method. One place that is being considered is in using the human figure model. If the patient clicks on the left hand and a words pop up that say hurts, move, feel, and hold, then the patient can choose a word that will communicate a lot of information. The grammatical sentence formed is incorrect but compansion could make it a correct sentence to show to a nurse or caretaker. The monitor for the participating communicator would show ‘My left hand hurts.’ 
  5. Abbreviated Expansion – There are appropriate places for this method as well. This is more appropriate to combine with standard word prediction. If the patient is trying to type in text and has long word to enter, they might try to abbreviate it and see if the system can deduce the whole for them.
  6. Artificial intelligence – The use of artificial intelligence for an application like this is mandatory. It seems like it would be necessary in almost all aspects of the design. The use of frequency and recent words used to put as first choices for word prediction would one example. The environmental control would need this in order to talk to the communication device. If the patient entered a unique word that he uses, the system can then include that word as a predictive choice and lessen the times the patient will have enter a word letter by letter.
  7. Adaptive gain/Sticky icons – Adaptive gain is the method of using the cursor speed to assess when the user is close to the target. Sticky icons are a way of helping a user with inaccurate movement toward a target. As the user gets near a target, the system will force the cursor into the target where it can be activated.   Scott Hudson suggests that adaptive gain be combined with sticky icons to create a situation where the cursor is only sticky if the cursor slows down. [4a]      

 

Metaphors

  1. Morse code – This has been tried with able-bodied individuals to assess the learning curve relate to Morse code.  Melody Moore results
  2. Neural gestures (sign language) – In the future, when we understand how to get differential signals that we can map to many different thoughts of movement, neural gestures could be very exciting. This will take experimentation and thorough training. Does thinking about making the sign for ‘d’ produce the same signal as ‘b’?
  3. Typing in text - This is very slow but quite necessary. Using the prediction strategies is the way to cope with the speed.
  4. The application as a role - The application could represent a caretaker role as the companion. There are many roles but we choose this because of the critical nature this role has. On the other hand, it could represent the son. In this situation, the application is more of an embodiment of the user rather than the companion of the user. It is not clear which role is more appropriate. It is possible that a combination of the two is most effective.
  5. The human form – There would a human form drawing on part of the screen that can be activated by a click. If he click on his left hand, there will be words options that come up to describe something about his left hand that he would like to express. In way, this could be seen as an avatar but it is more a quick way to form possible sentences.

 

Aesthetic

  1. Pleasant sound – Audio is important feedback. The types of audio should be pleasant to the user or sounds that do not get irritating with time. The audio should also have meaning.
  2. Clear contrasting colors - Users with low vision need contrasting colors. Often having a dark blue background with yellow letters is more visible. Using subtle and sophisticated colors are pretty but not appropriate for this population.
  3. Embodied agent photo – Should we use a photo of the user with different expressions as icons? If there are three icons that represent happy, neutral, and sad, is it more appropriate to have representations of it or actual pictures?
  4. Embodied Agent representation – How do we represent the user? Do we try to be a companion or do we make the application embody the user as much as possible?

 

Organization of Information

In order to organize information well, understanding the different types of information is necessary. There are four main types to consider, grammatical, conversational, contextual, and emotional. The sentence formation should be organized in a grammatical way. The chat part of the application should be organized in terms of parts of conversation. For example, greeting, body, and farewells are parts of conversation that might have different word and phrase prediction menus. The context should always be part of the application. When and where it should be implemented is not clear. Time of day, ambient information, and biological information are aspects of the organization. The emotional information should also be integrated at all times.  Is the patient upset or depressed? Is the patient excited and happy? The word prediction of the sentence formation and conversation productions should reflect this and is an aspect of sub-organization of communication.

 

Problem Areas


Designing for one person?

As we investigate the technological possibilities, we must assess user needs. Due to the current conditions, there is one main user who is able to do testing. Patients with locked-in syndrome have the common condition of nearly complete immobility and complete cognitive capacities. However, among this group are some differences. Some develop low vision. Some do not like certain strategies where others might prefer them. How do we best design with specific information in mind but still generalize to the locked-in population? Moreover, do we try to generalize to an even larger population?

Breaking the internal world wall

People with locked-in syndrome live within an internal world. Currently, the only times that they break out into the external world is when they are able to communicate. One person, for example, can blink to respond quickly to certain questions. Before we can get an application running constantly for them to interact with, we still need to know if our ideas are suitable. We can start by testing able-bodied users with a simulator. The simulator will use data that mimics the locked-in population to capture general problems. Then, we can test the locked-in population to see if we have truly succeeded. Until they are using this everyday, it will be hard to know if we have an appropriate design.

Error handling strategies

Our goal is to combine methods for allowing minimal input to produce maximal correct output. Even after thorough training, our users have a moderate probability of making many unintentional "clicks" or "keystrokes" during the course of communication.  We need to consider how errors might be handled with the various methods. Clearly, we would like errors to be handled as quickly as possible, to allow the high rate of communication we desire.

In scanning, the errors typically result from users activating a switch too early or too late. Because most current scanning tools start by highlighting entire rows and then high each cell with in that row second, activating a row switch too early or too late is more costly than a column switch. If the wrong row is chosen initially, the entire column contained in that row must be scanned in order to restart. A good way to minimize this potential problem is to adjust the speed of the scanner to match the users control. This might also be a problem because the patient’s fatigue will change the speed variable. We should also design the grid with a restart button in every column. If the user misses the row they want, the first cell in the column could be a restart option. With the word-based approach, an early switch activation is also very costly, since we then must scan through an entire screen of non-target words in order to restart. In either case, multiple successive errors can impact performance significantly. Again, having a way to start again in each step would be a way to avoid frustration. Of course, these ideas must be tested with the user population and then refined.

Abbreviation expansion presents a slightly different problem. Here, we have a few letters representing longer words or a series of words. The most obvious idea is to keep a history of how each sentence was generated. This method’s strength is in adapting to the user’s specific vocabulary usage. Errors will occur if the shortcut used could apply to many different phrases. For example, "I am doing fine" is an output for the user typing in idf. Let's say that our user also wants to add the word ‘identify’ to a sentence. The user might try to idf as a reasonable shortcut for the word. So, will the system insert the word or the phrase? A source of error reduction would be the artificial intelligence aspect of the system. If it understands context, these errors might not occur. In any case, the delete option will have to be accessible.

In word prediction, there are many possibilities for the delete key. As with abbreviation expansion, the goal may be to delete on a word-by-word basis, or to return to the point before the prediction was completed. The hard trick comes when handling errors with regard to natural language processing (syntax analysis). The whole structure of a sentence can be changed by an error in input, so asking ourselves how far back to go in order to catch the actual error without making the system so hard to use that there is no time savings. A simple sentence structure design is the initial focus. The user can choose a subject, verb, and object. At any time before they press ‘Done’, the user can go back to any of the three words and change it. It appears that as the sophistication of the system grows, the more complex the error handling becomes. At each stage of adding functionality, this will need to be addressed.

For sentence compansion, there is a clear way of handling errors. The software should remember the transformations from content words to richer, grammatically correct sentences. The delete key would allow for a return to the content words for further editing or a different selection of translation. At this point however, we return to the question of deletions among words and characters. Since compansion can relate either to word prediction or to semantic compaction, the methods employed there should be the same as employed here.

Conversational momentum currently relies on input that has already been constructed. If the participant is present during the construction of the sentence, once the user makes the word selection, the other participant in conversation sees output. We may want to add an "are you sure" button to ensure that the user has not made an error in selection, but this could be cumbersome, especially for users with good control of their movements. Since the preferred conversation will be generated on the fly, the contextual information will be an important part of the application. We will need to rely on a combination of contextually based word prediction methods and a good understanding of the structures of conversation to help assist the user as much as possible.

There is another question we must ask, and hopefully answer, for any of the above methods. Many times, we only notice an error in input a few keystrokes or clicks after the error has been made. Can we allow our users to "move back" through input to fix errors while leaving other portions of input intact? The level of complexity involved will depend on the method being used, but in each case, it seems that this task is a nontrivial event. Moreover, does this capability hinder efficiency in any way? It seems that we must strike a balance between user comfort with the system and known efficiency concerns. Training and practice will be key in handling errors. Need Jen’s input here for better insight into error correction.

We see that there are a number of unresolved questions in the described input methods with respect to error handling. The key for any system, though, will be the option for customization on a per-user basis. Since each of our patients may have a widely varying approach to communication and manipulation of input, the ability to catch errors common to that particular user will directly affect the effectiveness of the system. Further research is definitely recommended.

Design Ideas


It seems that finding the right combination of strategies is the key to improving the speed and accuracy of text entry for these patients. What is the right combination? Inevitably, only the patients can tell us. As researchers, we must consider the total design space and do our best to implement an application that can then be tested for usefulness and usability. The following are design concepts that are being considered:

  1. Jim Kitchen’s software application

The structure of this application is based on word prediction from a grammatical perspective. There are three main menus, Subject, Verb, and Object. A certain list of verbs is displayed based on the subject chosen. A certain list of objects is displayed based on the subject/verb chosen.  The application does not have logical control, the ability to move discreetly from event to event but rather it moves from pixel to pixel.

 

            Pros

·        Low amount of entry required

·        Relies on use of natural language

            Cons

·        Limited unique expression

·        Might lead user to use words predicted instead of the actual choice

·        Lacks sophisticated sentence structure

·        Relies on accuracy to improve speed

 

 

  1. T9 with word prediction

T9 is the mapping of letters of the alphabet to nine numbers. For example, if you press the number one that could represent A, B, or C. This mapping is represented on most telephones. If this is combined with word prediction, the combination of letters could produce plausible intended word. If an application is developed with logical control, there would be only nine buttons to choose from.

 

            Pros

·        Low amount of entry required

·        Symmetrical keyboard with low amount of choices (9)

·        Simple with logical control

            Cons

·        Error handling

·        Number to letter translation

            Data Required

·        Current raw signal data

 

     3.Virtual Keyboard with Compansion and Abbreviated Expansion

         A Virtual Keyboard application is currently in use. It is a simple display of equal sizes letters of the alphabet and a few other buttons such as space, comma, and period. If the keyboard was recoded to implement logical control and the compansion and abbreviated expansion strategies were incorporated into this, the amount of effort to type in text could be greatly improved.

            Pros

·        Low amount of entry required

·        Natural language

·        Familiar keyboard

·        Unlimited unique expression

            Cons

·        Error handling

·        Confusion of text entry feedback (Can they understand their errors?)

·        Can others understand what they are trying to say?

            Data Required

·        Grammar and syntax of the English language

·        Current raw signal data

 

     4.Letter Gestures with Word Prediction

        This idea is in the future but should be discussed so that the ultimate goal is clear.  The concept of letter gestures is training the patient to imagine moving their hand using sign language to make a letter. The act of making the letter will produce a unique neural signal that will produce the letter on the computer screen. If that can be accomplished with decent accuracy, the patient would be able to have the most control.

            Pros

·        More control

·        More speed

·        Unique expression

            Cons

·        Possible errors

            Data Required

·        Distinct varying patterns that represent different letters

 

     5. Supporting conversation

            Interruptions

                        Positive – respond leaving control with other person

                        Negative – take control back

            Feedback – Is the response time appropriate?

            Context

                        Where are you in the conversation?

                                    Greeting

                                    Body

                                    Farewell

                        Frequency

                                    Recurrent effect for prediction

 

            6. The Avatar/Companion

                        Avatar – the application represents the user in the communication mode

Companion – the application represents an assistant to help the user to communicate.

The logistics of the hardware in the room are important when implementing this idea. Figure 1 There should be two monitors showing information. The monitor that faces the patient should show the following information:

·        The virtual keyboard

·        The sentence formation area

·        The environmental control area

·        The mood selector Figure 2a

 

 

 

·        The human body communicator Figure 2b

 

 

The monitor that faces the participant should show the following information:

·        The sentence formation area

·        The environmental control ‘yes’ actions as they are occurring

·        Output from the human body communicator

 

 

In reality, the various ideas are just pieces of the whole design. The task at hand is to choose the right combination for the user population. The combination design will then need to be tested to assess the decision.

 

Current Direction


The current direction is to develop an application that uses Jim Kitchen’s design as a starting point and make it more usable and useful. The current structure of the program is simple and supports natural language entry. The idea is to keep the strengths of the program and add some features that make it more robust. The following elements are being consider in the design:

·        Add a “ Not Found” and “more” option in every menu that gives the user the opportunity to type in a unique word.

·        Any unique word that needs to be typed in will have the abbreviated expansion strategy to help produce words quickly.

·        Logical control will be implemented.

·        The application will have a main menu that informs the sub-menus of the context of the text to be typed.

·        Feedback must be considered at every step of the application.

·        An emotional state indicator will be added to inform the system of the mood of the text to be suggested when trying to predict.

·        A general structure of the human body could be added to assist a word prediction environment.

·        Implement redundant modalities of output and input for better feedback and generalization to other populations.

 

Once the program is written, usability tests will need to be performed to determine the strengths and weaknesses of the design. Initially, a usability test will be performed on able-bodied users. The users will be given a mock motor disability to get an idea of how it might effect the real user population. A good test sample of text should be used to test to usability of the application. Can the application make communicating faster and clearer? One way of verifying this is to develop a scenario and create sample text that would be something that Johnny Ray would need to say on a frequent basis. Can the application produce that conversation easily?

 

Conclusion


We have a unique situation with our patient group. They have the mental ability to conduct conversation at "normal" rates, yet their severe physical limitations disallow any contact with a computing device other than directly through the brain. There is a clear need for communication methods that allow for minimal input to generate maximum output, while gracefully handling errors in input. These people should have an opportunity to work at the speed at which their brain operates. Research should be conducted that implements one or more of the above methods, or perhaps develops a method specific to our group. These people are very aware of their position. If we can even provide one way that heightens their ability to communicate, they may develop ways on their own to communicate with others. The best research may very well come from our patients, and we should provide the tools for them to do so.

References


  1. melody-proposal.doc (internal research group document)
  1. Enhancing Scanning Input With Non-Speech Sounds by S.A. Brewster, V. Raty, and A. Kortekangas in ASSETS 1996 Vancouver, BC, Canada; p. 10-14
  1. Intelligent Word-Prediction to Enhance Text Input Rate by N. Garay-Vitoria and J. Gonzalez-Abascal in IUI 1997 Orlando, FL, USA; p. 241-244
  1. Iconic Language Design for People with Significant Speech and Multiple Impairments by P.L. Albacete, S.K. Chang, G. Polese, and B. Baker in ASSETS 1994 Marina Del Ray, CA, USA; p. 23-20
  1. Some Interface Issues in Developing Intelligent Communication Aids for people with Disabilities by K.F. McCoy, P. Demasco, C. Pennington, and A.L. Badman in IUI 1997 Orlando, FL, USA; p. 163-170
  1. Generating Text from Compressed Input: An Intelligent Interface for People with Severe Motor Impairments by P.W. Demasco and K.F. McCoy in Communications of the ACM May 1992 Vol. 35, No. 5, p. 68 - 78
  1. Semantic Compaction Systems Company Website - Frequently Asked Questions page by R.V. Conti, J. Micher, and G. VanTatenhove with URL http://kaddath.mt.cs.cmu.edu/scs/faq.html
  1. Prediction and Conversational Momentum in an Augmentative Communication System by N. Alm, J.L. Arnott, and A.F. Newell in Communications of the ACM May 1992 Vol. 35, No. 5, p. 46 - 57
  1. Neural Signals, Inc. Website URL: http://www.neuralsignals.com
  1. Prentke Romich Company Website URL: http://www.prentrom.com

 

1a Cassell, J. Embodied Conversational Agents: Representation and Intelligence in User Interface in press, AI Magazine URL http:/ www.justine@media.mit.edu

2a King, T. (1999) Assistive Technology: Essential Human Factors by Allyn and Bacon p. 17-18

3a Edward, A. (1995) Extra Ordinary Human Computer Interaction

4a Worden, A. et al. (1997) Making Computers Easier for Older Adults to Use: Area Cursors and Sticky Icons. In Proceedings of CHI’97, pp. 266-271

5a Mankoff, J. et al. (2000) Interaction Techniques for Ambiguity Resolution in Recognition-Based Interfaces

6a Mynatt, E. and Edwards, A. (2001) Tutorial: Designing for Users with Special Needs. CHI’01

Notes


This report can be found on-line at http://swiki.cc.gatech.edu:8080/brain-ui. The original author is Duke Hutchings, who can be reached by e-mail by the address hutch@cc.gatech.edu.