Download Speech transcription and analysis system and method
Transcript
US006714911B2 (12) United States Patent (10) Patent N0.: Waryas et al. (54) (45) Date of Patent: SPEECH TRANSCRIPTION AND ANALYSIS 5,717,828 A 2/1998 Rothenberg 5,791,904 A 5,813,862 A 8/1998 Russell et al. 9/1998 MerZenich et al. ' 2 - 7 gglllncpaltgilerre’nstagQHZEEQXTQS)’ Notice: _ _ EP 0 504 927 9/1992 EP 1 089 246 4/2001 99/13446 3/1999 SubJect to any disclaimer, the term of this patent is extended or adjusted under 35 WO USC 154(k)) by 0 days OTHER PUBLICATIONS Bernthal, John, et al. “Articulation and Phonlogical Disor ders,” 1998, Allyan & Bacon, 4th Edition, pp. 233—236, (21) Appl- NO-I 09/999,249 . (22) Filed: Nov. 15, 2001 (65) 292* Prior Publication Data (List continued on next page.) US 2002/0120441 A1 Aug. 29, 2002 igijicgztlrlllogirggrw? Related US. Application Data $111) Aglogeél 45156141) Zr Firm—A11@n, Dyer, Doppelt, 1 (51) (52) a . FOREIGN PATENT DOCUMENTS 0 360 909 4/1990 EP AHIOHIO, TX (US) _ e 2/1999 Beattie et al. (List continued on next page.) ( ) (73) Assignee: Harcourt Assessment, Inc., San _ gamg 6: a11 ear 5,865,626 A US _ 1;; , (US); Laurie Labbe, San Antonio, TX (63) Mar. 30, 2004 SYSTEM AND METHOD (75) Inventors: Carol Waryas, San Antonio, TX (US); (*) US 6,714,911 B2 ra Continuation-in-part of application No. 09/769,776, ?led on Jan. 25, 2001, which is a continuation-in-part of application (57) No. 09/770,093, ?led on Jan. 25, 2001. A t 1 c r1s , . . ABSTRACT 7 _ Int. Cl. ........................ .. G10L 21/06, G09B 21/00 - ranscription method uses a computerized process to prompt a student to produce at least one phoneme orally. NGXt a Correct and at least one incorrect production of the US. Cl. ...................................... .. 704/271; 704/220 phoneme are displayed The therapist Selects from among Fleld Of Search ........................ .. [he productions based upon the Student-produced 704/503, 3, 276, 271, 270, 268, 267, 260, 256, 254, 249, 240, 231, 211, 200; 434/362, phoneme. The system includes a processor and display to prompt a student to produce at least one phoneme orally, 185 _ References Clted disP laY a correct and at least one incorrect P roduction of the phoneme. The therapist then uses an input device in signal communication With the processor to select from among the Us PATENT DOCUMENTS displayed correct and incorrect productions based upon the (56) student-produced phoneme, thus obviating the need for the 49697194 A 2 5’487’671 A 5:562:453 A 5,636,325 A 5,679,001 A 11/1990 EZaWa et al' therapist to enter the incorrect production symbol by sBtlunllfr et atl' 1 1/1996 shacirgneetr :1 a' 10/1996 Wepn symbol, unless it is desired to do so, or unless the actual production is not found among the displayed production ' Selections 6/1997 Farrett 10/1997 Russell et al. 37 Claims, 25 Drawing Sheets enter student and therapist infonnation select type of administration 603 initiate phonemic profile present stimulus to student 605 present target and incorrect initiate - 606 yes i present stlmulus to student no I l 6 08 ' intendedtargetsentence t enter target production 609 select among I speech sample k 604 ' 607 enter inproduction‘ IPA options 619 620 l/ 621 622 display target production 623 editto aaal production 624 phonemic profile complete 7 yes automatic analysis performed 611 US 6,714,911 B2 Page 2 Us. PATENT DOCUMENTS 5,927,988 A 6,009,397 A 6,019,607 A 6,030,226 A 7/1999 Jenkins et 211. 12/1999 Siegel 2/2000 Jenkins et 211. 2/2000 Hersh 6,055,498 A 4/2000 Neumeyer et 211. 6,071,123 A 6/2000 Tallal et 211. 6,077,085 A 6,113,393 A 6/2000 Parry et 211. 9/2000 Neuhaus OTHER PUBLICATIONS Jackson, Peter, “Introduction to Expert Systems,” 1999, Addison Wesley Longman Limited, 3rd Edition, pp. 207—210.* Parrot Software User’s Manual “Automatic Articulation Analysis 2000,” Parrot Software, Inc.* Shneiderman, John, “Designing the User Interface,” 1998, Addison Wesley Lognman Limited, 3rd Edition, pp. 82—83.* American Speech—Language—Hearing Association, Tech nology 2000: Clinical Applications for Speech—Language Pathology, http://professional.asha.org/techiresources/ Additional Childes Tools, Childes WindoWs Tools, http:// childes.psy.cmu.edu/html/Wintools.html. Sails, the Speech Assessment & Interactive Learning System (SAILSTM) Using SAILS in Clinical Assessment and Treat ment, http://WWW.propeller.net/react/sails2.htm, pp. 1—3. GFTA—2: Goldman—Fristoe Test of Articulation—2, http:// WWW.agsnet.com/templates/productvieWip.asp?GroupID= a11750, pp. 1—3. KLPA: Khan—LeWis Phonological Analysis, http://WWW.ag snet.com/templates/productvieWip.asp?GroupID=a1820, pp. 1—2. Bernthal, John E., and Bankson, Nicholas W. (Eds.),Articu lation and Phonological Disorders, Fourth Edition, Chapter 9, Instrumentation in Clinical Phonology, by Julie J. Master son, Steven H. Long, and Eugene H. Buder, 1998, pp. 378—406. Masterson, Julie and Pagan, Frank, “Interactive System for Phonological Analysis User’s Guide,” pps 41, Harcourt Brace & Compnay, San Antonio, 1993. tech2000/7.htm, pp. 1—7, 1996. Long, Steven H. and FEY, Marc E., “Computerized Pro?ling PictureGallery, User’s Manual,” pps 119, Harcourt Brace & Company, San Antonio, 1993. http://WWW.psychcorp.com/catalogs/sla/ sla014atpc.htm, pp. 1—2. The Childes System, Child Language Data Exchange Sys tem, http://childes.psy.cmu.edu. * cited by examiner U.S. Patent Mar. 30, 2004 Sheet 1 0125 US 6,714,911 B2 Open professional version of system \l' / 100 Provide access to database of records / 101 . . provide access to ldemographic data i select problem speech sound /1o2 /1o3 104 apply no filter /105 apply filterto limit set i search database to create record set 107 < \106 no yes sort set of records into desired sequence / 108 @ FIG. 1A. U.S. Patent Mar. 30, 2004 Sheet 2 0f25 resent US 6,714,911 B2 . 0‘: Store/ store/transmit transmit ‘ /120 use personal version of system 110 present select display style / l 111 present record / i 112 prompt student for pronunciation / . . . therapist scores‘I pronunciation /113 114 hear word q pronounced ‘' /115 broadcast word calculate aggregate score t l \116 store current score J, \ 117 t ' l h ca l cu l a te h‘lSfl'lC? c ange \118 calculate statistics \119 ! FIG. 1B. U.S. Patent m Mar. 30, 2004 Sheet 4 of 25 US 6,714,911 B2 select type of analysis /501 present symbol to user /502 prompt userto pronounce word/perfonn narration 503 enter phonetic+representation /504 apply dialgctical filter /505 automatically categorize the error /507 i if desired, display frequency spectrum /508 Fl G. 3A. 509\[ if desired, broadcast correct pronunciation ] 511* perform correlation of erryoerss to make diagnosis| 512W issueteport ] 513\| save error in database 1 514W determine change overtime I 515* 516* issue report l recommend therapeutic program 1 @ FIG. 3B. U.S. Patent Mar. 30, 2004 Sheet 5 of 25 US 6,714,911 B2 @ present symbol to user on display /502’ in communication with processor l prompt user to pronounce word/ perform narration /503 i enter phonetic representation into /520 separate input device i download entered phonetic /521 representation into processor @ FIG. 4. 12\ student A V 10\ 47 system . l . \ operator Input and storage device 11\ i therapist FIG. 5. U.S. Patent Mar. 30, 2004 Sheet 6 of 25 US 6,714,911 B2 [enter student and therapist information K601 ir [ select type of administration K602 i f 603* initiate phonemic profile I i —>{ present stimulus to studentl’605 ‘ v _ Present target af‘d | initiate connected speech sample [/604 Incorrect productlons 607 q, \606 [ 619 present stimulus to student I/ V yes [determine intended target sentenceI/620 . [ entertargetV production 1/ 621 [609 i 622 608\ select among enter production I d'splayed opt'ons m IPA comers?" to IPA l display target production | edit to actual production yes. if I automatic analysis perfonned @> l v K611 FIG. 6A. Y 623 K624 U.S. Patent Mar. 30, 2004 Sheet 7 of 25 QLD US 6,714,911 B2 output analysis / 612 l aPP'Y age and/or dialect filter i /613 output analysis with filter applied / 614 + prepare parent letter and treatment recommendations / stimuli, transcription, analysis + / 617 Prepare report, letter, treatment recommendations / @o 615 618 U.S. Patent Mar. 30, 2004 Sheet 10 of 25 US 6,714,911 B2 [Current Date] Dear [Caregiver's Name]: RE: Client's Name [?rst and Last Name] Client's First Name] was tested on [Date of Administration] with the Computerized rticulation and Phonolo Evaluation System to see what sounds he/she [based on SEXLis able and unable to say. [ ient's First Name] was shown, for example, a photograph of a s oe on the computer screen and was asked to tell me what the photo was. This was not a test to detennlne if [Client's ?rst Name] knew the word, but rather how he/she [based on SEX] said it The results of this evaluation indicate whether Client's First Name] is able to say all the sounds that are expected at his/hc.r [based on S a e. The evaluatlon also indicates whether [Client's first name] is_using sounds earl for his/ her Biased on SEX] age [If the age filter is turned on and if the chem is less than 1 years of age . Here are [Client's l-Trst Namel's results: These are the sounds our Chlld is able to sa correctl , which are not expected at his/ her [based on SEX] age: [lfthe age fl ter is turned on and i client is ess than 10 years of age] [For Example] Sound in V V These are the sounds with which [Client's first name] is having difficulty: [For Example] Sound V V All letters in the “Sound:" column should be in orthographic, small letters, not in CAPS] Otei Results reflect application of the age ?lter [If age filter was used] Results reflect application of the dialect ?lter [ fdialect filter was used] [Do not print any note text if ?lters were not used.] [Print the following if there are no sounds with which the client is havin difficul [C|ient]'s test results do not indicate that he/she [based on S has di lculty with articulation and/or phonology at this time. FIG. 9. U.S. Patent Mar. 30, 2004 Sheet 11 0f 25 US 6,714,911 B2 Report Options (3 |Description of Client's Productions Reports] Word Length Inventories ‘ Stress Pattern Inventories Word Shape Inventories Consonant Inventories E] ------ ....... Initial [3 ------- ~C] Consonants i .......... .. Count .......... -- Count and Words [3 ------- {:1 Consonants By PVM Feature Single Feature ....... E] ....... P|ace L .......... .. Count t .......... .v Count and Words Voice p] ------- {:1 Manner E ....... A|| .......... .. Count .......... .. Count and Words Two Features [3 ------ E] ------- {:1 Place-Manner 5 ---------- -- ---------- -- Count Countand Words [5 ------- 4:] Place-Voice [g ------- {:1 Voice-Place [a ....... voice-Manner E] ------- --Ej Manner-Place [5 ------- {:1 Manner-Voice [+1 ------- {j Consonants By Nonlinear Feature 53 ....... (j Medial game‘ <§ack Nexp FIG. 10. ' _Preview U.S. Patent Mar. 30, 2004 Sheet 12 0f 25 US 6,714,911 B2 Treatment Sgestions for [Client's l-"irst Name and Last Name] [mm/dd/yyyy-Administration date] This report is fairlytgeneral in its a proach to ?nal Foal selection because there is considerable controversy in the leld of speeclzfangua e patho ogy about articulation and phonology treatment. in addition, it is important to consi er the ole client, In his or her environment, and int e context of other communication or learning needs and styles when determining the goals of treatment. Be sure to select the ppm of an age comparison (this would have to be set in the Preferences) and/ or a dialect ?lter (this would have to be set in the Demographics screen) if you choose to take into account those considerations in goal selection. I. Word Shape Goals Your client is having dif?culty with the shape of words. The priority at this point in development should be to strengthen the basic word structures of language. The following is a list of word shapes to target in treatment: [System displa 5 word shapes with a percent match lessthan 60% from Comparison qfC?'ent’s fProduction a Target Fonns.] CVCVC CVC CCVCC When targetin these word shapes, it is advisable to use sounds that are in your client's current inventory. For example, i a child uses only CV syllables and [p], [m], and [n] word initially, you can create CVC words such as ‘pop,’ ‘mop,’ ‘pan,’ etc. [System displays the following paragraph if no word shape has a match rate of less than 60%. Previous two paragraphs with the 60% or less match data is not printed] Your client s ows adequate word sha e development atthe presenttime, however, there are many segmental substitutions. Treatment s ould focus on segments in all word positions. Considerthree to four majorsound classes varying in place, manner and voicing. Your client produced these sounds with a relatively high degree of accuracy in_ the noted positions: _ _ ‘System displays consonant se ments produced with at east 70% match in at least one word position rpm the segmental part of the ompanson of Client's Production and Target Fonns.] lnitial T (70% C 95 ) Medial T (100%) Final T (90%‘ P(99%) P(76% P 71%) G 85%) BL(75%) FIG. 11A. U.S. Patent Mar. 30, 2004 Sheet 13 of 25 US 6,714,911 B2 [Printtjhis paragraph if the match rate of >70% does not apply. Previous paragraph and table are not nn e -l 0 segments meet the 70% criterion for match. For phonemes to utilize forword shape goals, select phonemes with the highest match below 70% (as identi?ed in the analyses comparing c ient and target productions.) Your client seems to overuse the following phonemes. He/she should be encouraged to use sounds other than these: [If results from Description of Client’s Production indicate that one consonant segment occurs over 25 times in the word initial position, over 12 times in the medial position, or over 17 times in the final position. If none of the segments meet these criteria, omit this section including the previous paragraph] Initial T Medial Hnal T P Printthis on all IPE Level 1 Treatment Su estions] _ tis usually advisable atthis stage of deve opmentto avoid targeting voiced stops in the ?nal position unless your client already shows use of these sounds in the ?na positron. ll. Segmental and/ or Feature Goals Your client mi ht also benefit from treatment on one ortwo new sounds orfeatures in the ?rst eriod of treatment. accuracy: e following is a list of sounds and features that were produced with less than 0% [From the segmental part ofthe Comparison of Clients Production and Target Forms: a. List all target consonant segments and consonant sequences by positron with percent match less than 60 . b. List all place, voice, mannerfeatures with percent match less than 60%. c. List all nonlinearfeatures with percent match less than 60%. * *List all phonemes in lPA characters] Segments Initial k 50% FLSSM; ( “0 Medial k 55% ?i59%(40%) Final f(0% vth(0%0% v (0%) f(0%l th 0% th 0% th 0% v(0%) th 0% s ‘' z 0% 2 0° sh 0% z 0% l 30%) s} % ch $30 ) “39%) in thg0% 5 % sh 0 Lu ch 30 ) new FIG. 11B. shch #030g‘,) r0%) U.S. Patent Mar. 30, 2004 Sheet 14 of 25 US 6,714,911 B2 Place-Voice-Manner Features Place Initial Medial ?nal Labiodental (0%) Dental (0%) Labiodental (0%) Dental (0%) Labiodental (0%) Dental (0% Palatalg25%) Velar (5 %) Palatal()25%) Palatal (25 ')alatal (34%) Labiodental Velar (5 %) Labiodental Velar (43%) Voice Initial (none) Medial Voiced (35%) Hnal voiceless (59%) Manner Initial Fricative (0% Af?cate( 0 ) Liflz?id (45%) A 'cateLiquid(25%) Medial Fricative (0% Final Fricative (9% Af?cate 0 ) Afficate( 0 ) Li uid (45%) A ‘cate Liquid(25%) Liquid (4 %) Nonlinear Features Manner Consonantal+ + nuant + initial % Medial Consonantal+ 5% ?nal Consonantal+ 9% - + -+ + FIG. 11C. U.S. Patent Mar. 30, 2004 Sheet 15 of 25 US 6,714,911 B2 Oral Place lnitial Labial Coronal Anterior + (15%) Medial ?nal Labial Coronal Anterior- (0%) Labial Coronal Anterior-+ (1 Medial Advanced Tongue Root - (13%) Final Advanced Tongue Root + (14 Advanced Tongue Root + Advanced Tongue Root + (19 Labialdental + Dorsal Anterior - (10%) Pharyngeal Place Initial Advanced Tongue Root + (58%) Advanced Tongue Root + Advanced Tongue Root + (14%) (for each sound listed underthe Target mismatch sounds, system will check the Description of lient's Production results to see whetherthat sound was present or not. For those sounds that WERE used at least once in any position. if none ofthe sounds meet the criterion, omitthis section including the following 3 paragraphs] The following target sounds were resent in your client’s phonetic inventory, which indicates that she or he is able to producet at sound or feature: lnitial Medial l-Tnal T p it you choose to target these sounds in treatment, you might consider using minimal pairs in orderto encourage fu rtheruse of the sound orfeature by providing a communicative incentive. Consult the Comparison of Cllent’s Production and Target Forms Report to determine WhlCh sounds are used most often as substitutions forthe target sound class and use those sounds in minimal pair contrasts. Although the target sounds were present in the inventory, it is possible that additional drill and practice will be necessaryto establish automaticity of production. For each sound listed underthe Target mismatch sounds, system will checkthe Description of lient’s Production results to see whetherthat sound was present or not. List those sounds that WERE NOT used at least once in any position. If none of the sounds meet the criterion, omit this section includingthe following3 paragraphs] The following sounds were absent from your client’s phonetic inventory: Target FIG. 11D. U.S. Patent Mar. 30, 2004 Sheet 16 of 25 US 6,714,911 B2 If you choose to targhet these sounds in treatment, you may first need to teach the client how to produce these sounds, and t en workto establish automatrcity. You may wish to conduct stimulability testing forthese sounds and use the results to select the sounds to target first in treatment. Targetthe sounds or features in word positions that the client already uses well (e.F., initial position if your client uses CV well). It is often advisable to address problematic sequences 0 ounds or features after sequences that show no mismatches. [Print this on all IPE Level lTreatment Suggestions] Choose sounds that differ in both place and manner and focus on sound class category (e.g., [v] and [s] as fricatives, ratherthan Lust [g], or [k] and [g] as velars, ratherthan just [k]) to establish a ma basis for change. Additionally, you may wish to considerwhether other accurately produced sounds have features in common with the mismatched sounds. It may be possible to teach the new sounds by extension from the already produced sounds (e.g., from /t/ to /k/, from /s/ to /z/ . etc.). FIG. 11E. U.S. Patent Mar. 30, 2004 Sheet 17 of 25 US 6,714,911 B2 Treatment Suggestions for [Client's First Name and Last Name] [mm/dd/yyyy - Administration date] This report is fairly eneral in its a proach to final goal selection because there is considerable controversy in the mid of speech language patho ogy about articulation and phonology treatment. in addition, it is important to considerthe whole client, in his or her environment, and in the context of other communication or learning needs and styles when deterrniningthe goals of treatment. Be sure to select the option of an age comparison (this would have to be set in the Preferences) and/ or a dialect ?lter (this would have to be set in the Demographics screen) if you choose to take into account those considerations in goal selection. I. Word Shape Goals Your client is havin dif?culty with more com lex structures of words that involve consonant sequences (blends? and longer sequences 0 consonants and vowels. The following is a list of word shapes to tar et in treatment: LSystem disp a word shapes with a percent match le_ss than 60% from Comparison of Client's roduction an Target Fonns.) CVCVC CVC CCVCC When targeting these word shapes, it is often advisable to use sounds that are in your client's current inventory. [System displays the following paragraph if no word shape has a match rate of less than 60%. Previous two paragra hs with the 60% or less match data is not printed.) Your client s ows adequate word shane development atthe present time, however, there are many segmental substitutions. Treatment s ould focus on segments in all word positions. Consider three to four major sound classes varying in place, manner and voicing. Your client produced these sounds with a relatively high de ree of accuracy in'the noted positions: [System displays consonant segments produced with at east 70% match in at least one word position from t e segmental part of the Comparison of Clients Production and Target Fonns] Initial T (79% C 99 ) P 100%) Medial T (99%) P (85%) G 70%) BL (85%) FIG. 12A. ?nal T (100%) P (100%) U.S. Patent [Print Mar. 30, 2004 Sheet 18 of 25 US 6,714,911 B2 paragraph if the match rate of >70% does not apply. Previous paragraph and table are not nn e . 0 segments meet the 70% criterion for match. For phonemes to utilize forword shape pals, select phoére‘mes with the highest match below 70% (as identified in the analyses comparing c rent and target pro u l0llS. YrlIJUI' cllient seems to ovenise the following phonemes. He/ she should be encouraged to use sounds other t an t ese: [it results from Description of Client's Production indicate that one consonant segment occurs over25 times in the word initial position, over 12 times in the medial position, prover 17 times in the ?nal position. if lnone of the segments meet these criteria, omit this section including the previous paragrap . Initial Medial Final 19-1 ll. Segmental and/ or Feature Goals Your client also shows limitations in speech sound development. The following is a list of sounds and features that were produced with less than 60% accuraiisy: [From the segmental part of the Comparison of Clients roduction and Target Fonns: gblést all target consonant segments and consonant sequences by position with percent match less than b. List all place, voice, and manner features with percent match less than 60%. 0. List all nonlinearfeatures with percent match less than 60%. * ‘List all phonemes in IPA characters.) 1. Segments Initial k 50% g 59% (0%) v(0%) Medial k 55% E 59% (40%) f (0% v(0%) th 0%) th 0% the its). th 0% th 0% v(0% th 0% 2 s 5.22) shch 0/u30 ) l 30%) r 0%) ?nal f(0%) s( %) z(0%) c 250%iI sh 0 )6 ch 30 ) J:30%) r 0%) FIG. 12B. “30%) r 0%)