Diagnostic Training
in Cardiology with an
Intelligent Tutorial System

B. Puppe*, F. Puppe#, B. Reinhardt#

* Department of Medicine, University Hospital, Wuerzburg, Germany

# Computer Science Department, University, Wuerzburg, Germany

Correspondence to:

Dr. med. Bernhard Puppe

University of Wuerzburg

Department of Medicine

Allesgrundweg 12

D-97218 Wuerzburg

Germany

tel: Germany 0931 - 70561-10

fax: Germany 0931 - 70561-20

email: bpuppe@informatik.uni-wuerzburg.de

Abstract

Medical case simulation systems are becoming ever more popular. Most of them lack explicit representation of their domain's knowledge. Without prior investment into a knowledge base, they allow immediate production of case tutorials. However, production of large numbers of cases is inefficient because each new case requires almost the same development effort as the first. By contrast, knowledge based tutorial systems allow rapid production of new cases without much further effort once their knowledge base and tutorial component are complete. The latter automatically guides the tutorial dialogue, with its feed back to the student computed by comparing his decisions and choices to those of the program's knowledge base. Accordingly, its design is of crucial importance for the quality of the tutorial dialogue. The issues are discussed and illustrated by a tutorial session with CARDIO-PARACELSUS, a knowledge based tutorial system for teaching diagnostic case solving in cardiology.

1. Conventional Tutorial Programs

Production of medical simulation programs has grown considerably over the last decade, especially in the USA (e.g. Cyberlog, RxDx-Series, MEDCAL-Series, Discotest-Series). From a computer science point of view, most of these programs are relatively simple, since all their interactions with students are explicitely coded. While this allows some of the cases to reach a high level of quality, it also explains why there are so few of them. Since even a case of medium complexity typically encompasses dozens of findings, tests, hypotheses, diagnoses, therapeutic and management decisions, the development effort for a single case tends to be high. This would be acceptable as long as it decreases with the number of cases developed. However this is not the case. The effort remains high and almost constant. No wonder then, that most programs provide only few cases. While it is possible to address many topics of a field in the context of a single case (e.g. by differential diagnostics), the scarcity of cases remains a disadvantage. A more generic approach toward case production is urgently needed.

2. Knowledge-Based Tutorial Programs

A major distinguishing feature of knowlege-based tutorial programs is their theoretical - not necessarily practical - ability to support clinical problem solving. In fact, the first of these systems were primarily developed with this goal in mind and only later transformed into tutorials (e.g. MYCIN/GUIDON [2, 4]). Sometimes, a few changes, adpations and additions are all that is necessary to exploit a good knowledge base as the foundation of an intelligent teaching program. Examples include some of the most famous decision support systems in medicine, including QMR [13] and ILIAD [10].

Conventional tutorials are devoid of any explicit knowledge of their domains. By contrast, such knowledge is represented in intelligent tutorials [15, 20, 21]. They also possess a problem solver to exploit this knowledge for automatic case diagnosis. These modules, together with a tutorial component, allow such programs to serve as a teaching tool. They present cases to students who try to solve them while the interactive program compares their choices and decisions with its own computed ones and provides appropriate feedback (explanations, comments, critique).

Contrary to building conventional programs, the developer of a knowledge-based tutorial does not have to specify in advance explicitely every reaction of the program to every possible student input. Instead, appropriate feed back is derived automatically from the program's knowledge base. Depending upon the degree of congruence or incongruence between knowledge base and student, his input is rated as correct or false. Hence, the basic premise of intelligent tutoring is that the knowledge base knows the correct answers to all questions arrising within the context of case solving in its domain. Strictly speaking, this is an ideal never achieved. Therefore, cases intended for use in a knowledge-based tutorial have to be carefully selected (consciencious verification of the system's correct handling).

The comparative development effort required for the two types of tutorial programs is illustrated in fig. 1.

Fig. 1 here

For the conventional tutor, the development effort rises nearly in linear proportion with the number of elaborated cases. Easy-to-use authoring systems (e.g. HyperCard or SuperCard) eliminate any big initial programming requirements. Hence, production of tutorial cases can start at once. Development of a knowledge based tutor, on the other hand, requires at the beginning a huge investment in knowledge base building. After its completion, mass production of tutorial cases begins. They can be produced with only minimal further effort provided the knowledge base solves the cases correctly.

In practice, it often errs. Identifying the flaws and correcting them is relatively simple. However, the "correction" might adversely affect system performance of related cases.

By contrast, in a conventional system cases do not affect each other because there is no link between them. They are all stand-alone cases. Consequently, inconsistencies between the logic taught in different cases will coexist peacefully, something impossible in a generic (knowledge-based) program.

The first intelligent tutor based on an expert system was GUIDON [4] built on top of the well known MYCIN resp. EMYCIN [2]. Insights gained from this work are:

* A general problem solving method - like backward chaining of rules in (E)MYCIN - restricts the explainability for tutorial purposes. Strategic knowledge - e.g. when to ask what questions - is coded implicitely in the sequence of rules and premises inside a rule, thus making it impossible for the tutorial component to differentiate between accidental and deliberate sequences of questions resulting from the rules.

* Unstructured rule formats make it difficult for the student to differentiate the key clause in the condition of a rule from context and activation clauses.

* Classification of diagnostic rules into categories eases students' memorization of knowledge. Useful categories include (1) rules representing general world knowledge necessary to be represented in the system but taken for granted by humans, (2) rules defining medical terms, (3) rules derived from empirical correlations between findings and diseases whose causality is unknown, and (4) rules with known causality which should be added as explanation.

Experience has shown that intelligent tutors can be built on top of expert systems. However, they require knowledge special in structure and contents to function well. Already Clancey found it useful to add knowledge to the original MYCIN knowledge base to increase its usefulness for GUIDON.

Whereas this program remained a research prototype, ILIAD [10] and QMR [13] are commercially availabe tutorial systems based on much larger knowledge bases for internal medicine. Their diagnostic problem solving capability has been recently evaluated in [1]. The results indicate that these systems often cannot identify the correct diagnosis, but can support physicians in differential diagnosis. This does not disqualify them for tutorial applications. When collecting cases for an intelligent tutor, the author discards any that the knowledge base does not handle well.

The general architecture of knowledge-based tutors is obvious: In addition to the general components of expert systems - for problem solving, knowledge acquisition, -representation and -explanation, interview/dialog - a specific tutorial component is necessary for case presentation and feed back; see e.g. [7, 9]. All these modules influence function, and it is important to evaluate empirically how exactly their particular specifications and properties affect system performance and student motivation.

The teaching quality of tutorial cases of conventional systems is often excellent (accounting for the commercial success of some of them). However, high quality comes at the expence of quantity. In generic systems, it tends to be the other way around: Large numbers of teaching cases can be produced rapidly, but their teaching quality tends to be limited due to automatic generation of all tutorial feed back. Here, quality improvements cannot be gained directly, but only through improvement of the knowledge base and tutorial program. Accordingly, progress tends to be small and slow, but when achieved, it positively affects a multitude of cases.

3. Knowledge Representation for Intelligent Tutors

In principle, knowledge bases for intelligent tutors have to be capable of solving clinical cases. However, it is not necessary for them to be practical decision aids. Also, high performance in decision support does not guarantee equally high performance in teaching. For a system to be useful in decision support, it has, above all, to provide correct suggestions, no matter how they are computed. Decision support systems exploit a variety of problem solving methods, including Bayes' statistics, probabilistic networks, neuronal networks, case comparison and heuristic rules. Not all of them are destined for use in tutorial systems. Here, one of the major requirements is to provide feed back in a form maximally enhancing students' understanding and learning. Explanations of a tutorial system why a student's choice (e.g. hypothesis) is right or wrong should be made in a way he easily understands and memorizes. Generic Systems explain their decisions in terms of their inference mechanisms. Those, like Bayes' probability theory or neuronal networks, which depend for their diagnostic precision on large quantities of numbers, obviously are not first choice for use in tutorial systems no matter how good their decision support record might be.

For decision support systems, representation of causal knowledge is sort of a luxury, since nobody expects them to understand the causality of their suggestions (as long as they are correct). By contrast, case-based tutorial systems should not only teach students to solve cases by correctly associating findings and diseases. In addition, they ought to enhance their understanding of the underlying pathophysiologic relations. Good conventional tutorials often discuss pathophysiologic issues relevant in the context of their teaching cases.

However, the teaching quality of the few existing intelligent tutorials is limited in this regard. They mostly abstain from addressing questions of causality and etiology since their knowledge bases lack necessary representation of causal and temporal knowledge. Long's "Heart Failure Program" is a notable exception [11]. It simulates in great detail the relationships between certain physiologic and pathophysiologic states affected by the disease process as well as by therapy. This allows students to see how various therapy options affect the (hemodynamic) parameters and hence the status of the patient. Since it took many years to develop the program to its current state - it still covers only part of cardiology - it remains an open question how the approach scales up.

Scaling up is a general problem of building knowledge-based systems. Many programs perform well in their original restricted domain, but run into difficulties when being extended to cover a broader area of medicine. Scaling up has been successfully achieved in ILIAD and QMR (see above).

Among the forms of knowledge representation best suited for building large teaching systems are heuristic approaches. In contrast to many other representations (e.g. Bayes, neural networks), they are formulated by human beings and easily understandable by fellow human beings. In contrast to causal functional models, heuristic representations have proven their efficiency for building large knowledge bases, and there is also empirical evidence that human experts rely heavily on heuristic knowlege for problem formulation [5, p 193f].

However, heuristic rules per se do not guarantee good knowledge base structure, as the MYCIN/GUIDON example shows. To fully exploit their potential, the shells should allow experts free expression of their heuristic knowledge, and they should carefully organize and structure their knowledge base according to certain principles. Existing systems vary widely with respect to both.

Because of the twofold function of the knowledge base of intelligent tutors - solving cases and explaining their solution to students - a compromise between optimal decision support and optimal promotion of learning has to be found. From a teaching point of view, the diagnostic or management rules should be as clear, short and simple as possible, so that they can be easily understood and learned. On the other hand, decision support rules need complex mechanisms to accomodate the many exceptions from the ordinary. Such rules may become too complex for students to learn and memorize. However, simplicity of rule syntax may limit a system's capability to correctly solve cases. Without that ability, no intelligent tutor is functional.

The dilemma is not unique to tutorial systems but also affects conventional teaching by human experts. When explaining their practical decision making in the classroom, they often concentrate on the essentials for the benefit of their students. The fact is that knowledge for decision making in practice and knowledge for teaching are not identical, but adapted to their specific purposes. An ideal knowledge-based tutor would behave like a good physician: using complex rules for solving cases and using somewhat simplified ones for teaching.

Determination of the appropriate level of detail for representation of findings is a related problem. A restricted level of detail - as in QMR and ILIAD - has the advantage of simplifying the knowledge base, interface and tutorial dialogue. Over-simplification or -abstraction however might decrease the learning effect for students since real patients present with detailed symptoms.

4. Knowledge Representation in D3: A Shell for Knowledge Based Tutorial Systems

Ever since its introduction more than 10 years ago, QMR's scheme of heuristic knowledge representation with its legendary combination of expressiveness and simplicity has not been surpassed. With just two parameters - evoking strength and frequency - QMR constructs symptomatic profiles of diseases that proved to be quite efficient in initial evaluations [13]. The relatively low diagnostic specificity recently reported by Berner [1] nevertheless indicates a lack of precision. Probably, only a minor part of it is attributable to suboptimal scoring of findings with the two mentioned parameters. Seemingly, the major part of the deficiency must be ascribed to the simplicity of knowledge representation which does not allow rating the evoking strength of combinations of findings. QMR allows only one-finding-one-diagnosis relations. However, often combinations of findings are more relevant diagnostically than the sum of their individual evidence.

Recognition of the necessity to weigh combinations of findings in medical knowledge bases was a major insight when we began developing our own shell for diagnostic problem solving. D3 [8, 16] achieves diagnostic precision without sacrifycing explainability and memorizability of its heuristic rules by two main methods. The first one aggregates multiple findings into abstractions from which other (higher-level) abstractions or diagnoses can be inferred. According to the terminology of inference structure for heuristic classification introduced by Clancey [3], there are two types of abstraction: "Data abstractions" - e.g. arterial hypertension - are often definions and inferred by categorical rules, whereas "solution abstractions" - e.g. right heart failure - resemble diagnoses and can be inferred also by probabilistic rules.

The second mechanism concerns D3's rule syntax. Like MYCIN and in contrast to QMR, D3 allows combinations in the conditions of its rules. Initially, these were groups of findings combined logically by "and" or "or". It soon became apparent however, that these simple combinations were not sufficient to capture all the multiform relations between findings and diseases. Also, along with the improvement of the precision of the knowledge base came an increase of its complexity. The reason is the rigidity of the rules syntax: If only one finding of a combination is missing, the respective rule cannot fire. To ensure adequate performance, one has to define rules for every relevant permutation. The number of rules thus rapidly increases until only their author understands their structure. Obviously, allowance for simple combinations did not provide a significant advantage over QMR's simpler scheme of knowledge representation. A more powerful syntax seemed necessary to reconcile the conflicting goals on a higher level.

As a result, D3 now employs rules with a n-of-m syntax: Here, the rule designer specifies how many findings (n) of a defined set of findings (m) must be present (minimally and maximally) for the rule to fire. A simple example of a diagnosis derived from 10 equally rated findings could read like follows: * if 8-10 findings are present, then the diagnosis is definite

* if 5-7 findings are present, then the diagnosis is probable * if 3-4 findings are present, then the diagnosis is suspected.

Each of the conditions of these rules covers a multitude of constellations of the 10 findings. The number of rules to score the diagnosis (either suspected, probable, or definite) thus decreases to just 3, accompanied by an increase in clarity and transparence. A scoring mechanism like QMR's needs 20 rules (two for each finding) for the same task.

However, since the diagnostic impact of multiple findings usually differs, simple n-of-m rules are not sufficient for diagnosis. Therefore, D3 extended the concept of simple n-of-m rules to so called "super-rules". They are n-of-m rules whose elements may constitute itself n-of-m constructs. This powerful syntax allows elegant and compact expression of most symptomatic profiles. Especially, it is possible to capture diagnostic definitions involving minor and major criteria, an approach traditionally used in rheumatology and becoming more popular in other medical domains. A simple example could read like follows: If at least one major and minimally two minor findings are present, then the diagnosis is definite.

Exact expression of such a sophisticated rule (or even an equivalent approximation) is impossable with a simple scoring scheme like QMR's. Here, the advantage of "super-rules" becomes apparent. Using their snytax, the condition can be represented as a combination of two subconditions (majors and minors) each of which has an internal n-of-m structure. Window 1 shows a corresponding example from the knowledge base of "CARDIO-PARACELSUS" for inferring "congestive heart failure" as defined in the Framingham Study [12]. The upper part of window 1 lists the minor criteria, whereas those in the lower part are major criteria.

Window 1 here

D3's super-rules restore much of the clarity and transparence of knowledge representation lost initially when including simple combinations into diagnostic profiles. The resulting effect on the knowledge base is comparable to QMR's legendary scheme of heuristic knowledge representation: By scoring only single findings, QMR keeps its conditions maximally simple. As a consequence, its diagnostic profiles usually contain multiple rules (two - one positive and one negative - per diagnosis). To determine the status of a diagnosis (e.g. suspected, probable, certain), the sum of the weights of fired rules has to be compared to thresholds. By contrast, D3's syntax considerably lowers the necessary number of rules. Ideally, there is only one rule for each status of a disease when all constellations of findings resulting in the same status can be expressed in a single condition.

The expert system shell "D3" [8, 16] specializes on diagnostic problem solving. It offers several alternatives for its main modules, allowing the user to configure the shell according to his specific needs. He can choose among 4 problem solving methods (heuristic, case based, set covering and Bayes), and the surface and dialog component also come in different forms to suit the needs and preferences of different users. Since a tutorial module was added, the shell also provides a platform for intelligent tutors.

The lack of tutoring shells was identified as one of the major obstacles for a larger deployment of such systems [22]. Most of the decision support systems for different medical domains built with D3 as a tool have been recently transformed into intelligent tutorials, including programs for diagnosis in rheumatolgy [19], neurology [17, 18], hematology [14], hepatology [6] and cardiology. While in different stages of development and evaluation, they all share as their primary goal the promotion of teaching and therefore prefer solving problems by heuristic reasoning. Ultimately, we plan to cover the entire field of Internal Medicine by a set of cooperating subspecialty tutorial systems.

5. A Tutorial Session with CARDIO-PARACELSUS

CARDIO-PARACELSUS is a knowledge-based system for teaching case solving in cardiology. Represented in its knowledge base of about 0.7 MB are approximately 500 findings for inferring about 100 cardiologic diseases. Its knowledge base is not yet complete and currently undergoing thorough restructuring. It has not yet been evaluated in field trials.

When teaching students the art of solving cardiology cases, CARDIO-PARACELSUS successively discloses the symptoms and signs of a selected case while the student states his suspected diagnosis, indicates tests for their evaluation, chooses a definite diagnosis and explains it in terms of the findings of the case. The program comments upon these decisions after comparing them with its own conclusions (assumed to be correct after verification of the system's handling of each case).

The tutorial system runs in two modes. In the "free" mode for advanced learners, the student is presented at the start with the symptoms and findings of the history and physical examination of a case. He then has to navigate completely on his own. The program does not feed him any more data unless he explicitely asks for them, i.e. the results of specified tests. CARDIO-PARACELSUS then comments upon the appropriateness of the indicated test and discloses its results provided they are available. At any time the student is free to suggest a diagnosis and promptly receives feed back. In case of disagreement he can justify his solution by citing the findings believed to be in favor of it, whereupon the system will point out his errors. Alternatively, the student can ask the system to cite the current (case specific) or general evidence for every diagnosis he is interested in.

In the guided mode recommended for beginners, the student does not have to choose and ask for data. Rather, CARDIO-PARACELSUS presents him all the available case information in a step-by-step fashion, beginning with history and followed by physical exam, basic technical tests and specific technical tests. After presenting each packet of information, the program asks the student for his suspected diagnosis, thus making sure he actually digests the data. Otherwise the guided mode resembles the free mode.

The following sequence of windows is taken from a tutorial session. It deals with the case of a 42 year old patient with the characteristic findings of aortic regurgitation. His presenting symptoms are exertional dyspnea, palpitations and general fatigue. At the start of the session, CARDIO-PARACELSUS presents the historic information available for the case (window 4). The data are structured and presented in hierarchical fashion. In order not to overwhelm the student with a multitude of detailed information, the program initially lists only the headings of the findings. Clicking on the triangles before the headings successively produces the included detailed data as exemplified in window 2 for "dyspnea".

Window 2 here

After familiarizing the student with the historical data, CARDIO-PARACELSUS asks him to indicate his diagnostic hypothesis in the program's hierarchically structured disease list of which window 3 shows only a section. By clicking on the triangle in front, the student can expand each disease category to the desired level of detail to find and mark his suspected diagnosis. In the example, he does not have a specific suspect and indicates a broad category (aortic disease) at the highest level of the hierarchy.

Window 3 here

CARDIO-PARACELSUS compares the student's hypothesis with its own suspect (based on the same data). Because of disagreement (the program suspects cardiac disease), the generated comment is negative (window 4).

Window 4 here

Next, CARDIO-PARACELSUS presents the available data of the physical examination and again asks the student for his opinion. He now has a better idea of what is wrong with the patient and wants to suggest rheumatic (valvular) heart disease as diagnosis. He finds it by appropriately expanding the disease hierarchy (cp. window 3). The intelligent tutor at this point wants to know how strong the student believes in his hypothesis (suspected or confirmed). In our example, the tutor agrees with the student's choice. However, since he missed other hypotheses entertained by the program, his rating is only "acceptable" (window 5).

Window 5 here

After completion of history and physical, CARDIO-PARACELSUS asks the student to indicate one or more investigations for further exploration of his suspected diagnosis. The student chooses Chest X Ray, ECG and Echocardiography from a list of possible diagnostic investigations. The tutor compares the student's choice to its own course of action in the same situation. Since it would also opt for these tests, it congratulates him for his reasonable choice (window 6).

Window 6 here

Now the conscientious student wants to make sure he is right to send the patient to the echocardiography lab. To this end, the program lets him mark the diagnosis/hypothesis he wants to explore by echo. The student selects "aortic regurgitation". To find out whether it is a good idea, CARDIO-PARACELSUS' tutorial program asks its knowledge base for advice. This responds it would also opt for an echo. The tutorial program always believes into what the knowledge base thinks. It hence concludes that doing an echo is reasonable in the current situation and gives the student corresponding feed back (window 7).

Window 7 here

CARDIO-PARACELSUS reports the student the results of the three indicated investigations (Chest X Ray, ECG, Echocardiography). When he knows all case data - or enough to feel sure to give a final assessment of the case - the student selects his definite diagnoses from the accordingly expanded disease hierarchy (cp window 3). Since the student recognized 50 % of the program's diagnoses (suspected and secured), the electronic teacher is satisfied with his performance and claps his hands in approval (window 8).

Window 8 here

Clicking on "Diagnoses" at the bottom of window 8 produces all the diagnoses of the case entertained by CARDIO-PARACELSUS. - For conclusion, the program asks the student to explain his diagnosis by citing the findings in favor of it. He selects the appropriate findings from the case master list. The program rates his choice by comparing it to the case findings actually in favor of "aortic regurgitation" (as indicated by its knowledge base). In the feed back of window 9, findings correctly cited by the student are printed in bold type, while those he missed appear in normal print. Upon demand, the program shows all findings indicative of the diagnosis (including those not present in the case).

Window 9 here

7. Discussion

The quality of a generic case-based tutorial system depends upon the quality of its knowledge base and the tutorial component conducting the dialog with the student. The most basic prerequisite is a correct knowledge base because it makes the electronic teacher knowledgable. Obviously, if he does not know all the correct answers to questions arising in the context of tutorial sessions, he cannot tell his students when they are right or wrong. Once the quality of the knowledge base satisfies the developer's expectations, emphasis shifts to design of the tutorial program.

The tutorial program rates students' performance for two tasks: suggesting diagnoses and ordering tests. In addition, for each of these choices the student can give an explanation or justification and have them also rated.

Correct automatic rating of student performance of cases with a single diagnosis is simple. However, CARDIO-PARACELSUS usually produces multiple diagnoses because each aspect of a disease - e.g. severity, location, etiology - is represented for technical reasons as a separate diagnosis. Omission by a student to indicate a diagnosis entertained by the tutor which is not covered by a more specific one, results in a lowered rating of diagnostic performance. Also, the program does not distinguish between major and minor (unrelated) diagnoses of a case. Thus, omission of the major one results in the same lowered rating as omission of the minor diagnosis.

Scoring students' justifications for their diagnoses should take into account the different impact and importance of various findings. For example, citation of the two decisive findings should be rated higher than citation of five marginals. For a program with a knowledge representation like QMR, this is easy because all findings are weighed independantly and individually. Principally, D3's tutorial component is also capable of differential weighing of findings. However, this task is made difficult by the complexity of knowledge representation. Often, no direct links between findings and diagnoses exist, but only indirect ones via computed intermediate states (pathophysiologic conditions). Currently, students cannot cite these as evidence for a diagnosis, but only directly observable findings (common to all diagnostic models in D3 whereas intermediate states are model-specific). The individual diagnostic contributions of observable findings are approximated by proportionate attribution of evidence. Thus, the strong diagnostic impact of a characteristic combination of findings or of a pathophysiologic condition is converted to the sum of small impacts of the many implicated individual findings. In other words, when rating students' explanations of their diagnoses, D3 behaves as if its knowledge representation were similar to QMR's one.

The second students' task rated by D3's tutorial system is choice of an investigation. This again is simple as long as CARDIO-PARACELSUS itself indicates only one test. When it entertains multiple tests, the rating gets tricky because the sequence has to be taken into account. Currently it is a problem for the program itself to establish a reasonable sequence because of the interference of two indicating mechanisms operating in the knowledge base, one categorical (e.g. finding X is observed, then do test Z), the other probabilistic (e.g. if diasease Y is suspected, then do test Z). Since the resulting sequence is not always satisfactory, the tutorial program lacks a reliable reference for comparing and rating students' test choices.

The same complexity of knowledge representation that improves performance of CARDIO-PARACELSUS as a decision support system simultaneously complicates good performance as an intelligent tutor.

CARDIO-PARACELSUS presents all case findings verbally. Presently, it is incapable of giving audiovisual illustrations. For example, when it comes to heart auscultation, students do not have to listen to a recorded sound track and recognize its features, but are told directly the (correct) interpretation. For a tutorial system designed to teach case solving in a simulated medical environment, this is of course a serious disadvantage. For conventional programs, audiovisual illustrations have become standard practice almost from the beginning. However, within the frame of a knowledge-based system, this is considerably more complex if the advantage of generation of cases does not get lost and has not yet been realized. Only after the knowledge base is complete and shown to be able to correctly diagnose a large pool of tutorial cases can program developers begin to concentrate upon audiovisual illustrations for these cases and their findings. For CARDIO-PARACELSUS, we plan to add audiovisual illustrations over the next 3 years.

For major findings, several illustrating examples are required, preferably ones demonstrating the variability of clinical findings. Once the library of audiovisual illustrations is complete, it is possible to draw upon them to "decorate" the program's tutorial cases. We plan illustrating tutorial cases by automatically selecting for each finding a corresponding illustration from the audiovisual library, thus greatly speeding up the process. Provided certain constraints are taken into account (sex, age, ethnicity etc), we hope for the results to be compatible (the multilocal origion of the illustrations should be unnoticable). Here again, the generic character of the knowledge-based approach becomes apparent. If the concept works, multimedial tutorial cases could be produced almost as rapidly as the case data are fed into the program (cp. Fig.1).

The emphasis of CARDIO-PARACELSUS is on teaching diagnostic case solving. Patient management aspects are only marginally touched because the program currently lacks a temporal model of hemodynamic states and their relationships necessary to simulate the effects of drugs or other therapeutic interventions.

The main criteria differentiating knowledge-based from conventional tutorials are summarized in table 1.

Table 1 here

Are knowledge based tutorials really better than conventional ones? In table 1, advantages and disadvantages appear almost evenly distributed. One might argue that knowledge-based systems only turned to tutorial applications after failing in practical decision support. Presently, conventional tutorials dominate the market. But will they be strong enough to defend their lead against the artificial intelligence newcomer?

Hand crafted tutorial cases might preserve their qualitative lead over automatically generated ones. These tend to suffer from insufficiencies of both their knowledge bases (e.g. no representation of deep knowledge) and tutorial components. However, as in industry at large, mechanization and automation finally take over the majority of the market provided their products meet certain quality standards. The distribution succes of the neurology tutor [17, 18] - freely available per voucher included in a standard german neurology textbook - seems to point to success in this direction.

Another advantage of knowledge based programs is flexibility and adaptability. When their handling and consequently teaching of a case has become outdated - e.g. after introduction into practice of a new diagnostic test or treatment - it can be improved easily by changing the corresponding part of the knowledge base. By contrast, changes of case handling in a conventional program is difficult because all details are hard-coded and hence require extensive and explicit recoding, the effort of which is comparable to creating new cases. As a consequence, conventional cases are likely to remain a scarce commodity.

Possibly, conventional and knowledge-based tutorial systems will be used for different purposes, thus rather complementing each other than competing against each other. Conventional tutors can be viewed as a hypermedia-using extension of modern textbooks which, in order to enhance their readers understanding, illustrate their chapters with didactic cases and examples. The emphasis here is clearly on primary learning. By contrast, knowledge-based tutors are better suited for training cognitive capabilities to improve practical performance. By training, we mean frequent repetitions of similar tasks. By providing hundreds of cases, knowledge-based systems allow simulated frequent repetitions suitable to acquire a form of "laboratory" routine and experience useful for physicians beginning to practice.

For conclusion, we would like to emphasize the limits of simulating medical environments to train students solve cases. No matter how good the simulation is - even involving the best of multimedia techniques - it is still very unlike the real situation at the bed side. To be very familiar with the reality of the sick patient seems to be a necessary precondition to exploit the enormous potential of the new learning method. Framed after the disease-oriented structure of typical textbooks, the knowledge representation of many students is standing on its head. Turning it on its feet (problem-oriented) was until recently the priviledge of practical involvement. By offering students a convenient way to train case solving early during education, tutorial systems featuring case simulation have the potential to help with the process.

Acknowledgement

The research reported in this paper was supported by a grant from the German Ministry of Research and Technology, MEDWIS-project A47 (Entwicklung und Evaluation medizinischer Diagnostik-Expertensysteme zur Wissensvermittlung).

Literature

[1] E. Berner et al., Performance of four computer based diagnostic systems, N Engl J Med 333(1994) 1792-1796, .

[2] B. Buchanan, E. Shortliffe, Rule-Based Expert Systems - The MYCIN Experiments, (Addison Wesley, 1984).

[3] W. Clancey, Heuristic Classification, Artificial Intelligence 20(1985) 215-251, .

[4] W. Clancey, Knowledge-based tutoring: The GUIDON program, (MIT Press1987).

[5] A. Elstein, L. Shulman, S. Sprafka. Medical Problem Solving, (Harvard University Press, 1978, p 193 f)

[6] C. Engler, A. Führer, P. Hempel et al., HEPA-CADS: Development of tutorial usable expert systems for diagnosis in hepatology (in German) accepted for publication in: Informatik, Biometrie und Epidemiologie in Medizin und Biologie. 1995.

[7] D. Fontaine, P. Beux, C. Riou et al., An intelligent computer-assisted instruction system for clinical case teaching, Meth. Inform. Med. 33(1994) 433-445.

[8] U. Gappa, F. Puppe, S. Schewe. Graphical knowledge acquisition for medical diagnostic expert systems, Artificial Intelligence in Medicine 5(1993) 185-212. [9] M. Inui, Fundamental research on simulation based intelligent tutoring systems, in: Proc. of the World Congress on Expert Systems (Pergamon Press 1991) Vol 1, 329-336. [10] M. Lincoln et al., ILIAD training enhances students diagnostic skills, J. Med. Systems 15(1991) 93-110.

[11] W.J. Long, S. Naimi, M.G. Criscitiello, Evaluation of a new method for cardiovascular reasoning, JAMA (1994) 1:127-141. [12] P.A. McKee, W.R. Castelli, P.M. McNamara et al., The natural history of congestive heart failure: the Framingham study, N Engl J Med185(1971) 1441-1445. [13] R. Miller, F. Masarie, Use of the Quick Medical Reference (QMR) program as a tool for medical education, Meth. Inform. Med. 28(1989) 340-345.

[14] B. Puppe, Building a medical knowledge base: Tricks facilitating the simulation of the expert's reasoning. in: Proceedings of AIME '93: 5th Conference on Artificial Intelligence in Medicine Europe, (Elsevier, Amsterdam,1993).

[15] F. Puppe, Intelligent Tutoring System, (in German) Informatik-Spektrum 15(1992) 195-207.

[16] F. Puppe, K. Poeck, U. Gappa et al., Reusable Modules in a configurable Classification Shell, (in German), Künstliche Intelligenz 2(1994), 13-18.

[17] F. Puppe, B. Reinhardt and K. Poeck: Generated Critic in Knowledge Based Neurology Trainer, accepted for AIME-95, 1995.

[18] F. Puppe, B. Reinhardt and K. Poeck: Case-Oriented Neurology Trainer, (in German) Künstliche Intelligenz 1(1995), 52-54.

[19] S. Schewe. Expert system "Rheuma" on a PC in clinical validation, Brit. J. Rheumatol. 31(1992) Supp. II p. 150. [20] E. Wenger, Artificial Intelligence and Tutoring Systems, (Morgan Kaufman, 1987).

[21] B. Woolf, Intelligent tutoring systems, a survey, in: H. Shrobe, ed., Exploring Artificial Intelligence, (Morgan Kaufman, 1988).

[22] B. Woolf, AI in education, in: S. Shapiro, ed., Encyclopedia of Artificial Intelligence, (Wiley, 1992) 2. Edition, Vol 1, 434-444.

Figure 1: Comparison between knowledge-based and conventional tutorial system: How development effort rises with the number of cases

Window 1: Condition for super-rule to derive the diagnosis "congestive heart failure". The upper half comprises minor diagnostic criteria, the lower half the major criteria. The advantage of such a rule syntax is the possibility to cover with a single rule a wide variety of permutations.

Window 2: Presentation of historical data in tutorial session with CARDIO-PARACELSUS

Window 3: Student enters "Aortic Disease" as diagnostic hypothesis

Window 4: CARDIO-PARACELSUS criticizes the diagnostic hypothesis "aortic disease" entered by a student after studying historical data

Window 5: CARDIO-PARACELSUS criticizes the diagnostic hypothesis "valvular heart disease/-etiology rheumatic fever" entered by a student after studying examination data

Window 6: CARDIO-PARACELSUS approves a student's indication for ECG after comple-ting the physical examination

Window 7: CARDIO-PARACELSUS accepts the diagnostic hypothesis "aortic regurgitation" as justification for a student's ordering of an echocardiogramm

Window 8: CARDIO-PARACELSUS approves of the final diagnostic assessment of the case given by the student

Window 9: CARDIO-PARACELSUS evaluates the student's explanation, i.e. findings given in support of his diagnosis "aortic regurgitation"

Table 1: Comparison between knowledge-based and conven-

tional tutorial program