Download Natural language processing and query driven information retrieval
Transcript
US 20020010574A1 (19) United States (12) Patent Application Publication (10) Pub. N0.: US 2002/0010574 A1 (43) Pub. Date: Tsourikov et al. Jan. 24, 2002 Publication Classi?cation (54) NATURAL LANGUAGE PROCESSING AND QUERY DRIVEN INFORMATION RETRIEVAL (51) Int. c1.7 . (52) US. Cl. ...................................... .. 704/9; 704/1; 707/3 .............. .. G06F 17/27 (57) (76) Inventors: Valery Tsourikov, Boston, MA (US); Igor Sovpel, Minsk (BY); Leonid Batchilo, Belmont, MA (US) ABSTRACT In a digital computer, the method of processing a natural language expression entered or downloaded to the computer that includes (1) identifying in the expression expanded subject, action, object components that includes at least four Correspondence Address: components, subject, action, object (SAO) components and STANGER & DREYFUS 608 SHERWOOD PKWY at least one additional component from the class of prepo sition, indirect object, adjective, and adverbial eSAO com ponents (2) extracting each of the at least four components MOUNTAINSIDE, NJ 07092 (US) (21) Appl. No.: 09/815,260 (22) Filed: Mar. 22, 2001 for designation into a respective subject, action, object ?eld and at least a preposition ?eld, indirect object ?eld, adjective ?eld, and adverbial ?eld, and (3) using the components in at least certaincomponents displaying ones of said to the ?elds user,for(ii)at forming least onea search of Related US. Application Data pattern of a user request for information search of local or (63) Non-provisional of provisional application No. base. A constraint ?eld can also be provided to accept on-line databases, and (iii) forming an eSAO knowledge non-classi?ed components. 60/198,782, ?led on Apr. 20, 2000. 5 Sources of Documents to be 82 User Requests processed Query driven information search 84 '7 7 b 88 W36 7 Semantic eSAO Semantic eSAO Processing Processing V f_\ 90 eSAO Structures Index of Source Documents V /\ 92 eSAO Search Patterns of User Requests (including links to documents) v/\ 94 V Comparative analysis of eSAO Search Patterns of User Requests and eSAO Structures of Source Documents V Relevant eSAO Structures of Source Documents 96 Patent Application Publication Jan. 24, 2002 Sheet 1 of 8 Digital Computer US 2002/0010574 A1 User lgput I\2 User request H 47 Type of user request recognition / 67 User request eSAO analysis Linguistic KB 87 Search pattern generation Output /‘10 search patterns FIG. 1 Structural and Functional Scheme of the Semantic Processor for User Request Analysis Patent Application Publication Jan. 24, 2002 Sheet 2 0f 8 US 2002/0010574 A1 User request 1 16 7 User request classi?cation 18’) Keywords Linguistic 207 Bit sentence KB 22 7 Statement (command sentence) 247 Question sentence 257 Complex query FIG. 2 Basic Types of the User Request _.___, 34 /V Patent Application Publication Jan. 24, 2002 Sheet 3 of 8 US 2002/0010574 A1 26; User request (keywords) 12 v 28 7 Request parsing 3O Parsed user request FIG. 3 Structural and Functional Scheme of the User Request eSAO Processor (the case of “keywords”) Patent Application Publication Jan. 24, 2002 Sheet 4 of 8 US 2002/0010574 A1 h 34 User request (bit, command, question or complex sentencel H 36 7 Part of speech tagging ll 377 Recognition of introductory Dart of the ouerv H Linguistic 38 7 Request parsing KB ll 407 Request converting @ 42,7 eSAO extraction f” 44 eSAO - request FIG. 4 Structural and Functional Scheme of the User Request eSAO Processor (the case of “bit ”, “command”, “question” or “complex” query) Patent Application Publication Jan. 24, 2002 Sheet 5 of 8 US 2002/0010574 A1 48 Tagged request 507 Verb chains recognition @527 Linguistic Noun group recognition KB 54 '7 Syntactical dependency tree construction 56 Parsed request FIG. 5 Structural and Functional Scheme of User Request Parser Patent Application Publication Jan. 24, 2002 Sheet 6 of 8 US 2002/0010574 A1 ("-60 Parsed request 62 7 12 2 Structure of “question’ 9 sentence recognition Linguistic — 64 F Request converting 657 Question word substitution 66/ Parsed converted request FIG. 6 Structural and Functional Scheme of User Request Convertor Patent Application Publication Jan. 24, 2002 Sheet 7 of 8 US 2002/0010574 A1 70 Parsed or parsed converted request 727 Action recognition 747 127 Linguistic KB Subject and object recognition 767 Attributes recognition u Constraints recognition ii g78 eSAO- request FIG. 7 Structural and Functional Scheme of User Request eSAO extractor Patent Application Publication Jan. 24, 2002 Sheet 8 of 8 US 2002/0010574 A1 F82 Sources of Documents to be User Requests processed Query driven information search 84 '7 /\86 /——88 Semantic eSAO Semantic eSAO Processing Processing i /\ 90 l b 92 eSAO Structures Index of Source eSAO Search Patterns of User Documents _ Requests (lncludlng links to documents) i r94 Comparative analysis of eSAO Search Patterns of User Requests and eSAO Structures of Source Documents f‘ 96 Relevant eSAO Structures of Source Documents FIG. 8. Query driven information search US 2002/0010574 A1 NATURAL LANGUAGE PROCESSING AND QUERY DRIVEN INFORMATION RETRIEVAL RELATED APPLICATION [0001] US. patent application Ser. No. 60/198,782, ?led Apr. 20, 2000. J an. 24, 2002 [0009] Thus, Semantic Processor for User Request Analy sis according to the principles of the present invention aims at analyZing and classifying different types of user requests in order to create their formal representation (in the form of a set of certain ?elds and relations betWeen them) Which enables more effective and ef?cient ansWer search in local [0002] The present invention relates to methods and appa ratus for semantically processing natural language text in a digital computer such that use of the processed data or and remote databases, information netWorks, etc. Also, the output search patterns can be used to search for matching eSAO’s in eSAO Knowledge Base in the system With much more accuracy and reliability than prior systems and meth ods even for requests being in the form of questions. In addition, the eSAO format enable greater accuracy in obtain representation shall lead to more reliable and accurate ing precise information of interest. One exemplary system results than heretofore possible With conventional systems. according to the present invention also forms an eSAO knoWledge base or index of stored processed information that can be managed by various rules related to the eSAO components and ?elds. BACKGROUND [0003] One example of such use includes processing user queries into search, retrieval, veri?cation, and display desired information. [0004] Another example is to analyZe the content of processed information or documents and use such informa tion to create a detailed and indexed knoWledge base for user access and interactive display of precise information. [0005] Reference is made to knoWn systems for extract ing, processing, and using SAO (Subject-Action-Object) DRAWINGS [0010] Other and further objects and bene?ts shall become apparent With the folloWing detailed description When taken in vieW of the appended draWings in Which: [0011] FIG. 1 shoWs a schematic vieW of one example of data embodied in natural language text document in digital (electronic) form. These prior systems process native lan a digital computer system in accordance With the principles of the present invention. guage user requests and/or documents to extract and store [0012] FIG. 2 is an example of a classi?cation routine for classifying the type of user request usable in the system of FIG. 1. the SAO triplets existing throughout the document as Well as the text segment associated With each SAO and link betWeen each SAO and the Text segment. Links are also stored in association With each text segment and the full source document Which is accessible by user interaction and input. [0006] Although SAO extraction, processing, and man agement has advanced the science of arti?cial intelligence both stand-alone computer and Web-based systems, there is a need in the art for yet greater accuracy in computer reliability in the semantic processing of user requests, knoWledge base data, and information accessed and obtained on the Web. SUMMARY OF EXEMPLARY EMBODIMENT OF INVENTION [0013] FIG. 3 is an example of a parsing routine for the case of user request being key Words. [0014] FIG. 4 is similar to FIG. 3 Where user request is a bit (segment) sentence, command sentence or question sen tence. [0015] FIG. 5 shoWs a parsing routine for the case of user request being “bit”, “command”, “question” or “complex” query. [0016] FIG. 6 shoWs a parsed synonymic search pattern expanding routine. [0017] FIG. 7 shoWs a routing for generating the eSAO [0007] It is an object of the present invention to expand the semantic processing poWer of computers to include not only user request. the SAO but to use a neW, more comprehensive, extended eSAO Knowledge Base or Index (90) and using a user Subject-Action-Object (eSAO) format as the foundation for rule based processing, normaliZation, and management of natural language. [0008] [0018] natural language search query for relevant eSAO component and source information display from the knoWledge base. DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENT OF THE INVENTION One skilled in this art Will note that prior systems SAOs included three components, subject (S), action (A), Object (O), the expanded SAO (hereafter “eSAO”) includes a minimum of four components and ?elds and preferably seven components and ?elds. These additional ?elds include adjectives, prepositions, etc. more fully described beloW. In one exemplary embodiment, an eighth ?eld is preferably provided into Which all other components can be placed. These other components and eighth ?eld are called con straints. Where the knoWledge base or information in local and remote databases are to be accessed in response to a user request (or query) the system preferably uses the same rules and number of ?elds to process the natural language user request as to process candidate access or stored documents for presentation to user. FIG. 8 shoWs the principal stages of forming as [0019] The folloWing are incorporated herein by refer ence: 0020 1. S y stem and on-line information service presently available at WWW.cobrain.com and the publicly available user manual therefor. [0021] 2. The softWare product presently marketed by Invention Machine Corporation of Boston, USA, under it’s trademark KNOWLEDGIST® and the publicly available user manual therefor. [0022] 3. WIPO Publication 00/14651, Published Mar. 16, 2000. Jan. 24, 2002 US 2002/0010574 A1 [0023] 4. US. patent application Ser. No. 09/541,182 ?led Apr. 3, 2000. [0024] -continued Output: 5. IMC’s COBRAIN® server software mar keted in the United States and manuals thereof. [0025] See references Nos. 3, 4, and 5 above for systems and methods of using an SAO format for developing an SAO extracted Knowledge Base. [0026] The system and method according to the present invention employs a new expanded S-A-O format for semantic processing documents and generating a database of expanded SAOs for expanded information search and man 3. moon keep Object: same face Preposition: Indirect object: towards Earth Adjective: — Adverbial: Input: always The dephasing waveguide is ?tted with a thin dielectric semicircle at one end, and a guide cascaded with the dephasing element completely suppresses unwanted modes. agement. [0027] Subject: Action: Output: Subject: guide cascaded with the dephasing element Note the prior systems SAOs included three com Action: suppress nents, subject (S), Action (A), Object (O), whereas one example of expanded SAOs (hereafter “eSAO”) includes a Object: unwanted mode Preposition: — minimum of 4 classi?ed components up to 7 classi?ed Indirect Object: — components (preferably 7 classi?ed ?elds) and, optionally, Adjective: — an 8th ?eld for unclassi?ed components. 4. It was found that the maximum value of x is dependent on the ionic radius of the lanthanide element. [0028] In one example, the Extended SAO (eSAO)— components include: [0029] Adverbial: completely Input: Output: Subject: maximum value of x Action: be 1. Subject (S), which performs action A on an object O; Object: — Preposition: on IndirectObject: ionic radius of the lanthanide element [0030] 2. Action (A), performed by subject S on an Adjective: dependent object O; Adverbial: — [0031] 3. Object (O), acted upon by subject S with action A; 5. Input: This was true even though the RN interphase reacted and vaporized because of water vapor in the atmosphere at intermediate temperatures and glass formation occurred at higher temperatures. [0032] 4. Adjective (Adj .)—an adjective which charac teriZes subject S or action Awhich follows the subject, in a SAO with empty object O (ex: “The invention is Output: Subject: glass formation ef?cient”, “The water becomes hot”); Action: occur Object: — [0033] 5. Preposition (Prep.)—a preposition which gov Preposition: at erns Indirect Object (Ex: “The lamp is placed on the IndirectObject: higher temperature table”, “The device reduces friction by ultrasound”); Adjective: — [0034] 6. Indirect object (iO)—a component of a sen tence manifested, as a rule, by a notional phrase, which Adverbial: — 6. Input: The composites were in?ltrated under vacuum, cured at 100 degree C, and precalcined in air at 700 degree together with a preposition characteriZes action, being C. an adverbial modi?er. (Ex: “The lamp is placed on the table”, “The light at the top is dim”, “The device Output: reduces friction by ultrasound”); [0035] 7. Adverbial (Adv.)—a component of a sentence, Subject: — Action: in?ltrate Object: composite Preposition: under IndirectObject: vacuum which characteriZes, as a rule, the conditions of per Adjective: — forming action A. (Ex: “The process is slowly modi Adverbial: — ?ed.”, “The driver must not turn the steering wheel in such a manner.”) [0036] Examples of application of the eSAO format are: [0037] In addition, Subject S, Object O and Indirect Object iO have their inner structure, which is recogniZed by the system and includes the components proper (Sm, Om, iOm) and their attributes (Attr (Sm), Attr(Om), Attr(iOm)). Input: Is the moon really blue during a blue moon? Output: Subject: Action: moon be Object: — Preposition: during Indirect Object: blue moon Adjective: really blue Adverbial: — Input: Does the moon always keep the same face towards the Earth? 2. The elements of each of the pairs are in semantic relation P between each other. [0038] If, for purposes of the following description, we denote any of the elements Sm, Om, iOm as Om, then Subject S, Object O and Indirect Object iO are predicate elements of the type P(Attr(Om), Om). The system consid ers and recogniZes following types of relation P: Feature (Parameter, Color, etc.), Inclusion, Placement, Formation, Connection, Separation, Transfer, etc. Jan. 24, 2002 US 2002/0010574 A1 [0039] Examples (Only sentence fragments are given here, Which correspond to the S or O or iO): [0052] 1. Keywords (18) [0053] Keywords is a type of user request Where Words are organiZed into a Boolean expression using predetermined [0040] 1. Input: Ce-TZP materials With CeO2 content Output: P=Formation/With Attr (Om)=CeO2 content Om=Ce-TZP materials grammar rules. In one example, it comprises 6 rules for [0041] 2. Input: rotational speed of freely suspended ets. The operators may be expressed in user request in cylinder Output: P=Feature (Parameter)/of Attr (Om)= rotational speed Om=freely suspended cylinder different Ways, for instance AND can be Written as ‘AND’, [0042] 3. Input: ruby color of Satsuma glassAOutput: P=Feature (Color)/of Attr (Om)=ruby color Om=Sat suma glass in?x, pre?x and brackets operators. The folloWing operators are implemented: AND, OR, XOR, NEAR, NOT and brack ‘&’, ‘&&’, ‘+’. [0054] User request example: [0055] “(‘laser’ NEAR ‘beam’) && ‘heating”’ [0056] 2. Bit sentence (20) 0043 4. InP ut: micro-cracks situated betWeen sintered grains Output: P=PlaAcement/situated betWeen Attr (Om)=sintered grains Om=micro-cracks [0044] 5. Input: precursor derived from hydrocarbon gas Output: P=Formation/derived from Attr (Om)= hydrocarbon gas Om=precursor [0057] Bit sentence is a type of user request representing a part of sentence or sentence segment (incomplete sen tence) Which corresponds to a certain semantic element:pro cess, object, function (action+object), etc. [0058] User request examples: [0059] (a) solid state laser system 0045 6. InP ut: dissiP ation driver couP led to P oWer diAssipator Output: P=Oonnection/coupled to Attr (Om)=poWer dissipator Om=dissipation driver [0046] 7. Input: lymphoid cells isolated from blood of AIDS infected people Output: P=Separation/isolated from Attr (Om)=blood of AIDS infected people Om= lymphoid cells [0060] (b) decrease friction [0061] 3. Statement (22) [0062] Statement is a type of request Which is a gram matically correct imperative sentence. [0063] User request example: [0064] [0047] 8. Input: one-dimensional hologram pattern transferred to matrix electrode Output: P=Transfer/ transferred to Attr (Om)=matrix electrode Om=one dimensional hologram pattern [0048] It is clear, that the components Om proper can also be predicate elements (in the given above examples, it is, for instance, Ex. No. 2: Om-freely suspended cylinder, Ex. No. 8: Om=one-dimensional hologram pattern). It should be [0065] 4. Question sentence (24) [0066] Question sentence is a type of request Which is a grammatically correct interrogative sentence. [0067] User request examples: [0068] (a) What causes fuel cell degradation? [0069] (b) What is the chemical composition of the noted that for information retrieval purposes it is more important to recogniZe the structure of Subject, Object and Indirect object, that is Attr (Om) and Om than the types of relation P, because it is the basis of the algorithm of transition to the less relevant search patterns. Give me the number of employees in your company. ocean? [0070] (c) Do the continents move? [0071] 5. Comlex query (25) [0049] Semantic Processor for User Request Analysis [0072] according to the principles of the present invention aims at analyZing and classifying different types of user requests in order to create their formal representation (in the form of a set of certain ?elds and relations betWeen them) Which expressed, by several sentences, ie by the fragment of the enables more effective and ef?cient search for information or documents in local and remote databases, knoWledge bases, information netWorks, etc. [0050] Semantic Processor (FIG. 1) receives User Request 2 as input data. Using Linguistic KB 12, Semantic Processor identi?es or classi?es the type of request as described beloW (Unit 4) and performs eSAO analysis of the request in accordance With its type (Unit 6). Then, a number of search patterns is generated corresponding to the input user request Which represent its formal description designed for ansWer search (Unit 10) in databases, information net Works, etc. Complex query is a type of request, Which is text. [0073] User request example: [0074] (a) Is there anything I can give my one-month old son to relieve gas pain? I think he may have colic. [0075] (b) My 15-year-old son has recently been diagnosed With recurrent shoulder dislocation. Lately he got Worse. HoW is recurrent shoulder dislocation treated? [0076] (c) Because I have a chronic stuffed nose and no sense of taste, I have been taking a prescribed medicine (Claritin D). Is there a time limit after Which this medicine Will no longer have an effect? If so, What else can I take? [0077] (d) Three years ago, after months of extreme [0051] Semantic Processor analyZes the folloWing basic fatigue, general aches and pains and stomach prob types of requests (FIG. 2). lems, my family doctor gave me a diagnosis of Jan. 24, 2002 US 2002/0010574 A1 Epstein-Barr. He said my titers Were 5100. Recently I Went to an internist, Who ran numerous blood tests and said she thinks that I have ?bromyalgia. She doesn’t believe in the Epstein-Barr diagnosis. I am noW being referred to a rheumatologist. Is there such a thing as Chronic Epstein-Barr? And What is the difference betWeen Epstein-Barr and ?bromyalgia? [0078] After the type of request has been classi?ed, the [0098] For example: [0099] a) <PP BE (interested|Wondering) (if|Whether) a [0100] This pattern removes, for example, the folloWing part from the user’s query: [0101] I am Wondering if . . . (Unit 6). [0102] b) <MD PP VB PP [,]> [0103] This pattern removes, for example, the folloWing [0079] If the request has been recognized as “Keywords”, part from the user’s query: request is forwarded to eSAO module for further analysis i.e. it satis?es the rules of Boolean grammar, Semantic Processor converts the request into standard notation. See FIG. 3. For example: [0080] Input [0081] “(‘laser’ NEAR ‘beam’) && ‘heating’” Could you tell me . . . [0105] At the step of parsing, FIG. 4, verbal sequences (Unit 50) and noun phrases (Unit 52) are recogniZed from the tagged request (FIG. 5) and a syntactical parse tree is built (Unit 54). [0106] This module includes stored Recognizing Linguis [0082] Output tic Models for Syntactic Phrase Tree Construction. They [0083] ((laser) NEAR (beam)) AND (heating) [0084] [0104] If the request is of the type “bit” or “command” or describe rules for structuriZation of the sentence, i.e. for correlating part-of-speech tags, syntactic and semantic classes, etc. Which are used by Text parsing and SAO “question sentence” or “complex query”, eSAO Processor extraction for building Syntactic and Functional phrases (see (FIG. 4) performs its tagging (Unit 36), recogniZing intro ductory part of the request (Unit 37), parsing (Unit 38), Reference No. 4 (i.e. US. Patent application Ser. No. conversion (Unit 40). If the request type is “question sen tence”, semantic analysis (e-SAO extraction) (Unit 42), and outputs formal representation of the original request in the form of a set of predetermined ?elds. [0085] At the step of tagging (Unit 36), each Word of the request is assigned a Part-Of-Speech tag (its lexical-gram matical class). The analysis used here (see above identi?ed references Nos. 3 and 4) is supplemented With statistical data, obtained on the specially collected question corpus. This provides highly correct POS-tagging. In case of “bit sentence” several variants are possible. [0086] For instance: [0087] Input [0088] clean Water [0089] Output [0090] (a) clean_JJ Water_NN [0091] (b) clean_VB Water_NN [0092] where J] stands for adjective, VB—verb, 09/541,182), page 36, section “Tree Construction”). [0107] The Syntactical Phrase Tree Construction is based on context-sensitive rules to create syntactic groups, or nodes in the parse tree. [0108] A core context-sensitive rule can be de?ned by the folloWing formula: [0109] UNITE [0110] [elementil . . [0111] IF [0112] left context=L_contexti1 . . . L_context_n [0113] right_context=R_contexti1 . . . R_context_n [0114] Which means that the string in brackets (elementil . element_n) must be united and further regarded as a syntactic group of a particular kind, Group_X in this case, if elements to the left of the string conform to the string de?ned by the left_context expression, and elements to the right of the string conform to the string de?ned by the right_context expression. [0115] Elements here can be POS-tags or groups formed NN—noun by the UNITE command. [0093] Then, (Unit 37) the introductory part of the query [0116] is recogniZed, Which is a sequence of Words in the beginning of the query, none of Which is a keyWord for the given query. more elements. For example: [0094] a) Could you tell me . . . [0095] b) Is it true, that . . . [0096] c) I Want . . . [0097] . element_n] AS Group_X All sequences of elements can consist of one or [0117] One or both of context strings de?ned by left_con text and right_context may be empty. [0118] The context-sensitive rules are applied to a sen tence in a backWard scanning, from the end of the sentence to beginning, element by element, position by position. If the present element or elements are the ones de?ned in brackets in one of the context-sensitive rules, and context restricting This part of the query is excluded from further conditions are satis?ed, these elements are united as a processing or analysis. The recognition of the introductory syntactic group, or node, in the parse tree. After that the scanning process returns to the last position of the sentence, and the scan begins again. The scanning process is over only part is performed by means of patterns, making use of separate lexical units and tags. US 2002/0010574 A1 J an. 24, 2002 When it reaches the beginning of the sentence not starting any rule. Preferably, after a context-sensitive rule has imple mented, elements united into a group become inaccessible for further context-sensitive rules, instead, this group rep resents these elements. [0119] A simple example illustrates the above mentioned stages. [0120] Input Sentence [0121] The device has an open distal end. [0146] HAS-verb and one more noun group as a sentence. [0147] Thus, the ?rst stage in parsing deals With POS-tags, then sequencies of POS-tags are gradually substituted by syntactic groups, these groups are then substituted by other groups, higher in the sentence hierarchy, thus building a multi-level syntactic structure of sentence in the form of a tree. [0148] [0122] The_DEF_ARTICLE device_NOUN has_HAVE_s an_INDEF_ARTICLE open_ADJ distal_ADJ NoW there eXists tWo nodes, or groups—noun groups. Only one more rule is needed to unite a noun group, For instance (?rst, the results are presented for the four sentences given above): end_NOUN._PERIOD Grammar: [0123] BEGIN13BACKWARD_STAGE 1) The dephasing Wave guide is ?tted With a thin dielectric semicircle at one end, and a guide cascaded With the [0124] UNITE [0125] [(ADJ or NOUN) (NOUN or Noun_Group)] AS Noun_Group dephasing element completely suppresses unwanted modes. WiSentence WiNiXX WiNN aiAT guideiNN [0126] IF WiVBNiXX cascadediVBN WiINiN [0127] left_conteXt=empty [0128] right_conteXt=empty WithiIN WiNN theiATI WiNN [0129] UNITE [0130] [(DEF_ARTICLE or (NOUN or Noun_Group)] dephasing_NN INDEF_ARTICLE) elementiNN WiVBZiXX WiVBZ completelyiRB [0131] AS Noun_Group suppressesiVBZ WiNNS unWantediJJ modesiNNS [0132] IF [0133] left_conteXt=empty [0134] right_conteXt=empty [0135] END_BACKWARD_STAGE 2) It Was found that the maximum value of X is dependent on the ionic radius of the lanthanide element. WiSentence WiNN WiNN [0136] Rule 1 (ADJ and NOUN):Pass 1 [0137] The_DEF_ARTICLE theiATI WiNN maXimumiJJ valueiNN device_NOUN has_HAVE_s an INDEF ARTICLE open (Noun_Group: ofiIN distal_ADJ end_NOUN)._PERIOD [0138] Rule 1 (ADJ and Noun_Group):Pass 2 [0139] The_DEF_ARTICLE device_NOUN XiNP WiBEXiXX isiBEZ WiJJiXX dependentiJJ has_HAVE_s an_INDEF_ARTICLE (Noun_Group: open WiINiN oniTN WiNN WiNN theiAT I WiNN _ADJ (Noun_Group: distal_ADJ end_NOUN))._PERIOD [0140] Rule 2 (INDEF_ART1CLE and Noun_Group):Pass 3 [0141] The_DEF_ARTICLE device_NOUN has_HAVE_s (Noun_Group: an_INDEF_ARTICLE (Noun _Group: open_ADJ (Noun_Group: distal_ADJ end_NOUN )))._PERIOD ioniciJJ radiusiNN ofiIN WiNN theiAT I WiNN lanthanideiNN [0142] Rule 1 (DEF_ARTICLE and NOUN):Pass 4 [0143] (Noun_Group: The_DEF_ARTICLE device _NOUN) has_HAVE_s [0144] (Noun_Group: an_INDEF_ART1CLE(Noun _Group: open_ADJ [0145] (Noun_Group: distal_ADJ end_NOUN )))._PERIOD elementiNN 3) This Was true even though the BN interphase reacted and vaporized because of Water vapor in the atmosphere at intermediate temperatures and glass formation occurred at higher temperatures. WiSentence WiNN glassiNN Jan. 24, 2002 US 2002/0010574 A1 [0158] b) enc_WP, enc_WRB and enc_WDT tags cover all possible question Words: hoW long, hoW -continued much, hoW many, When, Why, hoW, Where, Which, Who, Whom, Whose, What. formationiNN WiVEDiXX occurrediVBD WiINiN atiIN WiNNS hi gheriJJ R [0159] temperaturesiNNS 4) The composites Were in?ltrated under vacuum, cured at 100 degree C, and precalcined in air at 700 degree C. WiSentence WiNNS TheiAT I compositesiNNS WiBEXiXX WereiBED WiVENiXX in?ltratediVBN [0162] WiINiN underiIN vacuumiNN Kennedy?”, because the Word order in these questions is direct (statement-like) and does not need to be changed. Besides, the remaining part of the question We mark as TL 5) “bit sentence” type Input: clean Water Output: (“tail”). a) <WiNN> <cleaniJJ> cleaniJJ <WateriNN> WateriNN [0163] In one example of the converting step 40, the elements in the right side of formula 1 are enumerated: b) <WiVPiXX> <cleaniVB> cleaniVP <WateriNN> WateriNN [0164] 1. <Wh-group> 6) “question sentence” type Input: [0165] 2. <First Verbal Group> What causes fuel cell degradation? Output: <WiqiSentence> [0166] 3. NG <WhatiWDT> <WiVBZiXX> <causesiVBZ> <WiNN> <fueliNN> <WiNN> <celliNN> WhatiWDT <7f7> fueliNN celliNN ‘If? At the stage of question transformation or conver sion (FIG. 6), in case of “question sentence” question structure is ?rst recognized according to its general descrip tion (Unit 62). This formal description concerns only that introductory part of the query or the Whole query, Which Will be transformed later on, and it is given in the folloWing Backus-Naur notation: [0150] 1. <Question>::=[<Wh-group>]<First Verbal Group>NG [0151] [<Second Verbal Group >] [0152] Notes: a) [0167] causesiVBZ <degradationiNN> degradationiNN [0149] It should be noted, that above-described grammar is build so as not to process posed to syntactic subjects— “What food can reduce cholesterol in blood?”, “Who killed means, that X element may be absent; [0153] b) NG—noun group; [0154] 2. <Wh-group>::=[Pr]<Wh>[NG] [0155] Notes: Pr—preposition; 4. <Second Verbal Group>and TL is marked as 5 [0168] Then, the formula of the query itself Will be: [0169] request=(1,2,3,4,5) [0170] In some cases certain elements of the formula may be absent. [0171] For example: [0172] a) What is the chemical composition of the oceanilQl (What) 2 (is) 3 (the chemical composition of the ocean) 4( ) 5( )? [0173] b) Do the continents moveilQl ( ) 2 (Do) 3 (the continents) 4 (move) 5 ( )? [0174] c) HoW much did it help?—>1 (HoW much) 2 (did) 3 (it) 4 (help) 5 ( )? [0175] d) 1 (What company) 2 (is) 3 (John) 4 (Work ing) 5 (at the moment for)—>3 (John) 2 (is) 4 (Work ing) 5 (at the moment for) 1 (What company) [0176] e) 1 (For What company) 2 (is) 3 (John) 4 (Working) 5 (at the moment)%3 (John) 2 (is) 4 (Working) 1 (for What company) 5 (at the moment) [0177] After the structural formula of the request has been [0157] Notes: a) enc|X means represents a lexical unit With a terminal symbol X, being its POS-tag; de?ned, the question is converted (Unit 64) according to the folloWing rule: US 2002/0010574 A1 J an. 24, 2002 [0178] (1 2 3 4 5)4(3 2 4 1 5) [0179] or extract from the parsed text eSAOs with ?nite actions, non-?nite actions, verbal nouns. One example of Action extraction is: [0180] (1 2 3 4 5)4(3 2 4 5 1) [0181] The second formula may be regarded as a special type of the ?rst one, connected with grammatical peculiari ties of the question. [0194] This rule means that “if an input sentence contains a sequence of words w1, w2, w3 which at the step of part-of-speech tagging obtained HVZ, BEN, VBN tags respectively, then the word with VBN tag in this sequence is in Action”. [0182] For example: [0183] a) 1 (What) 2 (is) 3 (the chemical composition of the ocean) 4 ( ) 5 ( )?—>3 (the chemical compo sition of the ocean) 2 (is) 4 1 (What) 5 ( ) [0184] b) 1 ( ) 2 (Do) 3 (the continents) 4 (move) 5 ( )?—>3 (the continents) 2 (Do) 4 (move) 1 ( ) 5 ( ) [0185] c) 1 (How much) 2 (did) 3 (it) 4 (help) 5 ( )?—>3 (it) 2 (did) 4 (help) 1 (How much) 5 ( ) [0186] d) 1 (What company) 2 (is) 3 (John) 4 (work ing) 5 (at the moment for)—>3 (John) 2 (is) 4 (work ing) 5 (at the moment for) 1 (what company) [0187] e) 1 (For what company) 2 (is) 3 (John) 4 (working) 5 (at the moment)%3 (John) 2 (is) 4 (working) 1 (for what company) 5 (at the moment) [0195] For example, [0196] has_HVZ been_BEN produced_VBN=>(A= produced) [0197] The rules for extraction of Subject, Action and Object are formed as follows: [0198] 1. To extract the Action, tag chains are built, e.g., manually, for all possible verb forms in active and passive voice with the help of the Classi?er (block 3). For example, have been produced=<HVZ><BEN><VBN>. [0199] 2. In each tag chain the tag is indicated corresponding to the main notion verb (in the above example-<VBN>). Also, the type of the tag chain (active or passive voice) is indicated. [0188] The described transformations of the questions [0200] 3. The tag chains with corresponding indexes enable to transform them into narrative form, which can be formed at steps 1-2 constitute the basis for linguistic easily translated into the search pattern. modules extracting Action, Subject and Object. [0189] Then, converted request is subjected to the “ques Noun groups constituting Subject and Object are determined according to the type of tag chain (active or passive voice). tion word substitution”. In accordance with special rules, question words are substituted with certain, so-called “null words” so as not to corrupt sentence structure: [0201] The recognition of such elements as Indirect Object, Adjective and Adverbial is implemented in the same way, that is taking into account the tags and the structure itself of Syntactical Phrase Tree. What Somethingl Which How Who Some Somehow Someonel [0202] Recognition of Subject, Object and Indirect Object How long Sometime RecogniZing Linguistic Models. These models describe Whom Someone2 How much How many Something2 Something3 rules (algorithms) for detecting subjects, objects, their attributes (placement, inclusion, parameter, etc.) and their Where When Somewhere Time clause Why Reason clause Whose Somebody’s attributes is carried out on the basis of corresponding meanings in syntactic tree. [0203] To identify parameters of an Object (Indirect Object, Subject) Parameter Dictionary is used. A standard dictionary de?nes whether a noun is an object or a parameter of an object. Thus, a list of such attributes can easily be [0190] Then the parsed converted request is submitted to User Request eSAO extraction 44. [0191] At the stage of eSAO extraction (FIG. 7), in the user request (in all cases except “keywords” case) semantic elements are recogniZed of the type S-subject (Unit 74), A-action (Unit 72), O-object (Unit 74) as well as their attributes expressed via preposition, indirect object, adjec tive, adverbial, as well as inner structure (the components proper and the attributes) of Subject S, Object O and Indirect Object iO. [0192] The recognition of all these elements is imple mented by means of corresponding Recognizing Linguistic Models (see Reference No. 4 (i.e. US. patent application Ser. No. 09/541,182) page 41, section “SAO Recognition”). These models describe rules that use part-of-speech tags, lexemes and syntactic categories which are then used to developed and stored in Linguistic KB (Block 80). For example, temperature (=parameter) of water (=object). To identify attributes such as placement, inclusion etc., Lin guistic KB includes a list of attribute identi?ers, i.e. certain lexical units. For example, to place, to install, to comprise, to contain, to include etc. Using such lists, the system may automatically mark the eSAOs extracted by eSAO extractor which correspond to said attributes. [0204] These algorithms work with noun groups and act like linguistic patterns that control extraction of above mentioned relations from the text. For example, for the relations of type parameter-object, basic patterns are [0205] <Parameter> of <Object> [0206] and [0207] <Object> <Parameter> Jan. 24, 2002 US 2002/0010574 A1 [0208] The relation is valid only When the lexeme Which corresponds to <parameter> is found in the list of parameters included in Linguistic KB. [0209] These models are used by Unit 76 of eSAO extrac tion module. The output of the unit is a set of 7 ?elds, Where some of the ?elds may be empty. [0210] For example (for the highlighted fragments of the ?rst tWo sentences given above): [0211] 1) The dephasing Waveguide is ?tted With a thin dielectric semicircle at one end, and a guide cascaded With the dephasing element completely suppresses unWanted modes. [0212] Subject: guide cascaded With the dephasing element [0213] Action: suppresses [0214] Object: unWanted modes [0215] Preposition [0216] IndirectObject [0217] Adjective [0218] Adverbial: completely [0219] 2) It Was found that the maximum value of X is dependent on the ionic radius of the lanthanide element. [0220] Subject: maximum value of x [0221] Action: be [0222] Object [0223] Preposition: on [0224] IndirectObject: the ionic radius of the lan thanide [0225] element [0226] Adjective: dependent [0227] Adverbial [0228] At the stage 77 User Request eSAO Extractor recogniZes constraints, i.e., those lexical units of the query, Which are not parts of eSAO. [0229] The constraints can be represented by any lexical unit except: [0230] (a) Question Words [0231] enc_WP, enc_WRB, enc_WDT [0232] Example: What, hoW, Where [0233] (b) Articles [0234] enc_AT, enc_ATI [0238] Example: do, did, does [0239] (d) Personal Pronouns [0240] enc_PPusd,enc_PPusd2,enc_PP1A,enc_PP1AS, enc_PP10,enc_PP1OS, enc_PP2, enc_PP3, enc_PP3A, enc_PP3AS, enc_PP30, enc_PP3OS, enc_PPL, enc_PPLS, enc_PP [0241] Example: I, We, they [0242] (e) Other Pronouns [0243] enc_PN, enc_PNq2, enc_PNusd, enc_PNusdq2 [0244] Example: same, each, something [0245] Determiners enc_DT, enc_DTusd, enc_DTI, enc_DTS, enc_DTX, enc_EX [0246] Example: this, those, these [0247] (g) Because, If [0248] enc_CS [0249] Example: because, if, since, after [0250] (h) Punctuation: [0251] enc_Exclamatory, enc_AmpersandFW, enc_RL Bracket, enc_RRBracket,enc_LeftQuote, enc_Right Quote, [0252] enc_MultipleMinus, enc_Comma, enc_Full Stop, [0253] enc_Spot3, enc_Colon, enc_Semicolon, enc _Question [0254] [0255] Example: “, ’, ?, !, . . . Others: [0256] enc_UH, enc_CC, enc_OD, enc_CD [0257] Example: Oh!, and, or [0258] As a result, eSAO extractor 42 outputs eSAO request in the form of a set of, for example, 8 ?elds Where some of the ?elds may be empty: [0259] 1. Subject [0260] 2. Action [0261] [0262] [0263] [0264] 3. Object 4. Preposition 5. Indirect Object 6. Adjective [0265] 7. Adverbial [0266] 8. Constraints [0267] Along With that, Subject, Object and Indirect Object each have inner structure, as described above. [0268] In case of “bit sentence” and “complex query”, [0235] Example: a, an, the more than one set of ?elds is possible. For instance: [0236] (c) Helpers: [0237] enc_DO, enc_DOD, enc_DOZ, enc_MD, enc_IN, enc_XNOT, enc_TO,enc_HV, enc_HVZ, enc_HVD,enc_BE, enc_BEZ, enc_BER, enc_BED, enc_BEDZ, enc_BEM [0269] (“Bit Sentence”) [0270] Input: clean Water [0271] Output: [0272] (a) US 2002/0010574 A1 Jan. 24, 2002 9 [0273] Subject: [0314] Output: [0274] Action: [0315] Subject: continents [0275] [0316] Action: move Object: clean Water [0276] Preposition: [0277] Indirect Object: [0278] Adjective: [0279] Adverbial: [0317] Object: [0318] Preposition: [0319] Indirect Object: [0320] Adjective: [0280] Constraints: [0321] Adverbial: [0281] (b) [0322] Constraints: [0282] Subject: [0283] Action: clean [02841 [0285] Preposition: [0323] (“Complex Query”) [0324] Input: My 15-year-old son has recently been 31:35:22 $012232;iiloriisim?izz?ss: dislocation treated? [0286] Indirect Object: [0325] Output: [0287] Adjective: [0326] Subject: [0288] Adverbial: [0289] Constraints: [0327] Action: treat [0328] Object: recurrent shoulder dislocation [0290] (“Statement”) [0329] Preposition: [0291] Input: Give me the number of employees in IMC Company [0330] Indirect object: [0331] Adjective: [0292] Output: [0332] Adverbial: [0293] Subject: [0333] Constraints: 15-year-old, son, diagnose [0294] Action: [0334] At the ?nal stage of processing the user request [0295] Object: number of employees in IMC company Semantic Processor forms Search Patterns Which are Bool ean expressions in case of “keywords”, and eSAOs in other [02961 2:3;‘tiiiziingiifsszizz?issrsanzix? [0297] Indirect Object: [0298] Adjective: request. [0335] For eXample: [0299] Adverbial: [0336] (“Bit Sentence”) [0300] Constraints: [0301] (“Question”) [0337] Input: clean Water [0338] Output: [0302] Input: What is the chemical composition of the Ocean? [0303] Output: [0339] (a) [0340] Subject: any [0304] Subject: chemical composition of the ocean [0342] [0305] Action: is [0306] Object? What [0343] Preposition: any [0344] Indirect Object: any [0307] preposition? [0345] Adjective: any [0308] Indirect Object? [0309] Adjective: [0346] Adverbial: any [0347] Constraints :any [0310] Adverbial: [0348] (b) [0311] Constraints: [0312] (“Question”) [0313] Input: Do the continents move? [0341] Action: any Object: Clean Water [0349] Subject: any [0350] Action: clean [0351] Object: Water Jan. 24, 2002 US 2002/0010574 A1 10 [0352] Preposition: any [0353] Indirect Object: any [0390] (“Complex Query”) [0391] Input: My 15-year-old son has recently been diagnosed With recurrent shoulder dislocation. Lately he got Worse. HoW is recurrent shoulder dislocation treated? [0354] Adjective: any [0355] Adverbial: any [0356] Constraints: any [0357] (“Statement”) [0358] Input: Give me the number of employees in IMC company. [0359] Output: [0360] Subject: Something1 [0361] Action: any [0362] Object: number of employees in IMC com Pany [0363] Preposition: any Indirect Object: any [0364] [0365] Adjective: any [0366] Adverbial: any [0367] Constraints: any [0368] (“Question”) [0369] Input: What is the chemical composition of the ocean? [0370] Output: [0371] Subject: chemical composition of the ocean [0372] Action: be [0392] Output: [0393] Subject: somethingl [0394] Action: treat [0395] Object: recurrent shoulder dislocation [0396] [0397] [0398] [0399] [0400] Preposition: any Indirect object: any Adjective: any Adverbial: any Constraints: 15-year-old, son, diagnose [0401] If no eSAO ?eld contains the “?” sign, that means the question is general. Absence of an element in a ?eld (“any”) means that this ?eld can contain anything. [0402] Functionality of all modules of the Semantic Pro cessor is maintained by Linguistic Knowledge Base 12 Which includes Database (dictionaries, classi?ers, statistical data, etc.) and Database of Recognizing Linguistic Models (for teXt-to-Words splitting, recognition of noun phrases, verb phrases, subject, object, action, attribute, “type-of sentence” recognition, etc). See References Nos. 3, 4, and 5 above. [0403] Thus, the output search patterns at 10 in FIG. 1 can be used to search for matching eSAO’s in eSAO Knowledge Base in the system With much more accuracy and reliability than prior systems and methods even for requests being in the form of questions. In addition, the eSAO format enables [0373] Object: ? greater accuracy in obtaining precise information of interest. [0374] Preposition: any [0404] Simultaneously, the user is offered the opportunity to receive possibly less relevant information, oWing to the strategy of less strict identity betWeen the corresponding ?elds in search patterns and in documents processed during [0375] Indirect Object: any [0376] Adjective: any [0377] Adverbial: any [0378] Constraints: any [0379] (“Question”) [0380] Input: Do the continents move? [0381] Output: [0382] Subject: continents [0383] Action: move [0384] Object: any [0385] Preposition: any [0386] Indirect Object: any [0387] Adjective: any [0388] Adverbial: any [0389] Constraints: any the search. Thus, for eXample, in the case of the last eXample: [0405] Subject: something [0406] Action: treat [0407] Object: recurrent shoulder dislocation [0408] Preposition: any [0409] Indirect object: any [0410] Adjective: any [0411] Adverbial: any [0412] Constraints: 15-year-old, son, diagnose [0413] Semantic Processor additionally can form a set of less relevant search patterns, by means of gradual refusal of “Constraints” ?eld elements and further—of recogniZed “Object” attributes, oWing to: [0414] recurrent=Attr (shoulder dislocation) [0415] shoulder=Attr (dislocation) US 2002/0010574 A1 J an. 24, 2002 11 [0416] Thus, the less relevant search pattern will be: forming a search pattern of a user request for informa [0418] Action: treat tion search of local or on-line databases, and (iii) forming an eSAO knowledge base. 2. In the method of claim 1 wherein, [0419] Object: dislocation the expression comprises a user request for information [0417] Subject: something [0420] Preposition: any [0421] Indirect object: any [0422] Adjective: any [0423] Adverbial: any [0424] Constraints: any [0425] Note the constraint has been removed, which can be in response to a user-entered command. [0426] With reference to FIG. 8, the query driven infor mation search 84 includes a semantic eSAO processing 86, 88 for creating eSAO structures index or Knowledge Base (including links to documents) 90 of source documents 80 and eSAO search patterns 92 of user requests 82. See search, said method further comprising classifying the expression into at least one category from the class that includes bit sentence, statement sentence, question sen tence, and complex query, and simplifying the user request search pattern by applying rules in accordance with the respective expression category. 3. In the method of claim 2 wherein, the rules include transforming a question sentence rules according to 12345—>32415 or 12345Q32451 references nos. 2 and 4 for further details on creating one example of a Knowledge Base. The present Knowledge wherein Base, however, can have up to 8 ?elds for the eSAO structures and constraints as described above. The search module 84 further includes comparative analysis 92 of eSAO search patterns 92 of user requests and eSAO struc tures index 90 of source documents. The comparative analy sis 92 identi?es the eSAO structures 96 of source docu ments, which are most relevant for eSAO search patterns of given user requests. These structures can be displayed to the <First Verbal Group> NG (Noun Group) <Second Verbal Group> TL (tail) user in order of relevance and the full source sentence of user selected structure and link to the full document can be 4. The method of claim 1 wherein, displayed. User selection of the document link shall access the full source document for display of the paragraph or the expression comprises a sentence of a document down load to the computer and wherein said process com paragraph segment that includes the eSAO components which can be highlighted for quick recognition. This docu ment display is scrollable through the entire document, see references nos. 2, 4, and 5 for further details of these functions. [0427] It will be understood that various modi?cation and improvements can be made to the herein disclosed exem plary embodiments without departing from the spirit and prises using the components for forming an indexed eSAO knowledge base entry, and selecting the eSAO entry for display of the eSAO com ponents, or of the source expression that includes the eSAO components, in response to a user request that includes at least a subset of the expression eSAO components. 5. The method of claim 1 wherein, scope of the present invention. the expression includes constraint components that We claim: 1. In a digital computer, the method of processing a natural language expression entered or downloaded to the computer comprising: identifying in the expression expanded subject, action, object (eSAO) components comprising at least four components including subject, action, object compo nents and at least one additional component from the includes components that are not classi?ed in any other component type, said extracting step, further includes extracting constraint components for designation into a constraint ?eld, and said using step further includes using the components in component at least certain display ones to of said the user, ?elds(ii) for forming at least one a search of class of preposition component, indirect object com ponent, adjective component, and adverbial compo pattern of a user request for information search of local nent, and edge base. extracting each of said at least four components for designation into a respective subject, action, object ?eld and at least one respective ?eld from the class of preposition ?eld, indirect object ?eld, adjective ?eld, and adverbial ?eld, and using the components in at least certain ones of said ?elds for at least one of component display to the user, (ii) or on-line databases, and (iii) forming an eSAO knowl 6. The method of claim 5 wherein, the object ?eld includes an object component ?eld seg ment and an attribute ?eld segment. 7. The method of claim 6 said method further comprising forming a less relevant user request search pattern by deleting one or more components from the constraint ?eld or one or more attributes from the object ?eld.