MACHINE MEDIATED LEARNING, 5(3&4), 199-219

1995, Lawrence Erlbaum Associates, Inc.

Generating Case-Oriented Training

From Diagnostic Expert Systems

 

Frank Puppe, Bettina Reinhardt

Department of Computer Science

Würzburg University

Training by diagnosing simulated cases is a powerful learning technique. While there are a lot of simulation programs available, the branching logic in most of them is explicitly predefined by their authors making these programs rather inflexible to use and time-expensive to build. We present a shell for building case-oriented training systems, in which not the branching logic is specified, but the underlying knowledge is extracted from an expert system. The interface for presenting cases and commenting the user’s actions to solve the cases is generated automatically. The case data is presented incrementally, and the students may iteratively choose the most useful tests for clarification of the case and also has to state their intermediary and final diagnoses. The system can follow the students, i.e. uses the same data for evaluating and criticizing their actions based on explicitly coded strategic and structural knowledge. Among others, the shell has been used for building three rather large training systems in neurology, rheumatology and flower classification, being well accepted by students in first external tests.

 

Learning programs for medical education have become quite popular. Eysenbach (1994), contains a comprehensive overview. He uses the following categories:

Most commercial medical education software is based on the multimedia presentation of knowledge being accessed associatively (see Eysenbach, 1994, pp. 266-271). One important application is education for patients with chronic diseases via hypermedia programs in physician’s waiting rooms. There are transitions and combinations among the aforementioned categories, for example, tutorial programs based on hypertext technology ín which alternative answers to questions are represented by special "links" to other windows.

Even patient simulation programs can be developed using only hypermedia techniques by including questions for tests, hypotheses, or therapies at certain points in the patient presentation. However, it is time consuming to build these programs, because every patient simulation has to be built from scratch. Therefore, the number of case simulations is rather limited in these programs, and students do not get the chance to test their knowledge on a variety of similar, but different cases.

An attractive alternative is generating the patient simulations from a generic knowledge base. This approach requires a large initial effort to build the knowledge base. After that, adding new cases becomes very simple. Several general overviews on intelligent tutor systems have been written (e.g., in F. Puppe 1992; Wenger, 1987; Woolf, 1988). The first intelligent tutor system based on an expert system was GUIDON (Clancey, 1987), built on top of the well-known MYCIN and EMYCIN (Buchanan & Shortliffe, 1984). Insights gained from this work include the following:

By adding knowledge to the MYCIN knowledge base, Clancey proved that tutorial systems can be built on top of expert systems. However, for good tutorial systems, the requirements for the structure and contents of the problem-solving method and knowledge base are considerable higher than for expert systems.

Whereas GUIDON remained a research prototype, ILIAD (Lincoln, 1991) and QMR (Miller & Masarie, 1989) are commercially available tutorial systems, based on much larger knowledge bases in internal medicine. Their diagnostic problem solving capability has been recently evaluated in (Berner et al., 1994). The results indicate high sensitivity and low specifity of program diagnoses. Consequently, although sometimes unable to find out the exact correct diagnoses, they are useful for supporting a physician who is trying to consider the full spectrum of alternative diagnoses. A major problem of these systems is their restricted knowledge representation. For example, QMR uses only one-to-one relationships between findings and diagnoses, rated with the two evidence values "evoking strength" and "frequency". However, a constellation of findings is often much more characteristic than the sum of findings viewed seperately. Another problem is the level of detail with which findings are presented. Although the restricted level of detail of QMR or ILIAD has many advantages for a simple user interface and knowledge base, it complicates the student’s task of inferring powerful abstractions from many detailed observations.

A great advantage for tutorial systems built on top of expert systems is that only those cases which the system can solve correctly may be selected for presentation to students. The general architecture of expert-system-based tutorial systems includes the basic components of expert systems designed for problem solving, knowledge acquisition, explanation, and interview, plus specific tutorial components are case presentation and criticism (see e.g. Inui, 1991; Fontaine Beaux, Riou, & Jacquelinet 1994). The main issues are practical evaluations of the influence of various problem-solving methods, the sophistication of the knowledge representation, the case presentation technique’s dependence on the amount of information to be presented, and the criticism component’s dependence on the time and motivation of the students. Because evaluations in multiple domains require shells for building tutorial systems in different applications (the lack of tutoring shells was stated as one of the major obstacles to a larger deployment of such systems; Woolf, 1992), we present a shell for building expert system based tutorial simulation systems. It is applied and evaluated in the context of three large training programs in the domains of neurology, rheumatology and flower classification.

Outline of the Diagnostic expert system
shell kit D3

The three training systems are based on the diagnostic expert system shell kit D3 ( F. Puppe, Gappa, Poeck, and Bamberger 1996). The key idea of the shell kit is to combine the convenience of an expert system shell with the flexibility of more general approaches for building expert systems. D3 contains module alternatives for several interviewer and problem-solving components. Moreover, due to predefined interfaces between the modules, it is possible to add new module alternatives without having to reimplement other modules. The most interesting module alternatives are the various problem solving methods: heuristic, statistic, case based, decision-tree based, set-covering, and functional classification (an empirical comparison of the first three in the domain of acute abdominal pain is given in B. Puppe, Ohmann, Goos, F. Puppe, & Mootz, 1995). In order to maximize reuse of code, the problem solving component has been subdivided in three components:

Because all of the previously mentioned problem-solving methods need data abstraction and dialog guiding, these components have been separated out as independant modules that are reused by each of the different diagnostic evaluation modules. Figure 1 gives an overview of the architecture of D3 with an emphasis on interviewer and problem solving components (An example for one interviewer component can be found in Figure 10.).

Figure 1 Architecture of diagnostic expert system shell kit D3 (without knowledge acquisition component; F. Puppe et al. 96). Main modules (interviewer component and problem solving component) and submodules (dialog guiding, data abstraction, diagnostic evaluation) are represented by rectangles; implemented module alternatives by ellipses. The control being indicated by arrows and the interfaces between modules are fixed.

The knowledge acquisition component in D3 (from the tutoring system’s point of view, also the "authoring component"), named CLASSIKA (Gappa, F. Puppe, & Schewe 1993] transforms the internal object oriented representation into an external graphical representation. Following are three principal steps of diagnostic knowledge acquisition and their corresponding graphics :

  1. Entering of terminological knowledge into graphs and hierarchies. The terminological knowledge in classification includes data (findings, observations), data abstractions, solutions (diagnoses), and solution categories (Clancey, 1985). In addition, D3 uses so called question sets to group related data and data abstractions into units for providing structure and better dialog guiding (B. Puppe & F. Puppe, 1988).
  2. Entering local knowledge into graphical forms. Local knowledge includes names, ranges, definitions, support knowledge, and so forth, of the terms.
  3. Entering of relational knowledge into tables. Relational knowledge includes simple relations and more complex rules being subdivided into many categories according to the type of action of the rule (e.g., rules for inferring data abstraction, rating of diagnoses, checking of plausibility of user input, and guiding the dialog).

An example for representation of the heuristic knowledge for a diagnosis in the neurology domain is given in Table 1. The regular scheme with a three-layered inference structure of findings, finding abstractions, and diagnoses holds for the neurology knowledge base but is not enforced by the shell D3. Whereas the rheumatology knowledge base has much more intermediate layers, the flower classification knowledge base has no finding abstractions or intermediate diagnoses at all, but only direct relations between flowers and their traits.

To build a training system based on an existing expert system in D3, it is only necessary to add or select cases the expert system can solve correctly. The training component presents a case stepwise to the students, asks them for their current hypotheses, and criticizes the answers based on the contents of the knowledge base. For criticizing diagnostic hypotheses of students, the training component first compares its own hypotheses -inferred from data already presented- with the students’ diagnoses. If they had suggested additional diagnoses, the training component computes their ratings. If the students want to justify a diagnosis, they must select and rate all findings from the case presentation that they think support the diagnosis in question. To criticize the justification, the trainer has to simplify its own justification by eliminating all intermediate diagnostic conclusions (e.g. in the neurology domain, the finding abstractions). Of course, there is some loss of information in this process. An advantage is that the resulting relations can be computed regardless of the kind of knowledge and problem-solving method used (heuristic, causal, statistic, etc.).

Because the strategic knowledge for dialog guiding is separated in D3 from the structural knowledge for inferring diagnoses, the same scheme can be used for criticizing the selection of tests by the user. The knowledge for indicating tests is represented either by categorical rules (e.g., "if a diagnostic category is established, then perform some standard tests for differentiation among successor diagnoses") or by a cost/benefits analysis, (e.g., with a list of tests for confirming a certain diagnostic hypothesis that is possible but not established). If a diagnosis is either confirmed, ruled out, or below the threshold for being suggested, then the rest of its list of tests will not be performed. If the students to pursue a test, they have either to state the diagnoses they want to pursue or the condition for performing that test (if it is a "standard" test). The criticism is again computed from a comparison of the system's with the students’ tests and -if the students suggest other tests- a computation of its costs/benefit ratio based on the available data.

Table 1

Chronic Alcohol Related Polyneuropathy

(1) Significant habits = regular alcoholic consumption

+

+

+

 

(2) Occurence of weakness/paralysis = slowly progressive

   

+

 

(3) Localisation of weakness/paralysis = right and/or left distal leg/foot

   

+

 

(4) Type of involuntary movements = tremor at rest or intention tremor

 

+

+

 

(5) Localisation of involuntary movements = both arms and hands

   

+

 

(6) -> Complaints of distal polyneuropathy

 

+

   

(7) -> Findings of axonal distal polyneuropathy of legs

+

   

-

(8) Tremor = tremor of rest or intention tremor right or left side

+

     

(9) Sexual functions = impotent or diminished

   

+

 

Type of column linking

&

&

&

&

(10) assessment for chronical alcohol related polyneuropathy

P6

P3

P3

N4

Note. Example for heuristic knowledge representation with tables. Each column of the table represents a rule, where the last row denotes the action and the other rows the precondition. The last but one row shows how the preconditions are combined (& = and; other options are v = or; n-from-m = two numbers representing the minimum and maximum number of cumulated preconditions from that column. Arbitrary combinations of these operators are also possible, but require a different graphical layout). The evidence ratings in the action part have a similar meaning as the evoking strengths (p1 to p6) and frequency (n1 to n6) in INTERNIST/QMR. The four rules in the table mean:

An example for an n-from-m riule ist the definition of the derived symptom "->complaints of distal ployneuropathy" from the table above, where 4 of the following 5 conditions must hold:

The Neurology Trainer

The Neurology Trainer comprises the diagnostic part of a standard textbook for neurology (Poeck, 1994). A voucher in the text book can be mailed to obtain the software -2,000 copies had been delivered by July 1996. The textbook author spent about 2 yr (about 1-2 hr per day) using D3/CLASSIKA to formalize and test the knowledge comprising definitions of about 300 neurological terms (like "complaints of distal polyneuropathy" in Table 1) and about 120 main neurologic diagnoses (like "chronic alcohol-related polyneuropathy" in Table 1). The knowledge base has been tested with about 200 model cases that serve also as training cases for students.

Figure 2 Presentation of history in a dynamic hierarchy. If the user clicks on the triangles in front of a terms, the the next hierarchical level is opened or closed. This mechanism enables the user to remain an overview even when confronted with a lot of information (the part of the history shown here is only about 1/4 of the total data of this not very complicated case).

In the following, we illustrate an example session of the Neurology Trainer. When starting a session, the students choose a case either at random or according to some criteria. First, the tutor presents the history of the patient, as shown in Figure 2. To allow different data presentations (e.g., faster but less realistic or slower and more realistic), the presentation interface can be configured accordingly. Options include concentration on the pathologic data only (the necessary information is specified in the knowledge base) or forcing the user to request data items one by one.

The regular activity consists of a cycle of selecting hypotheses based on the currently available information, receiving criticism from the tutor, and then requesting new information (Figure 3 and Figure 4). The latter may proceed in two versions: Either there is a predefined sequence of history, physical examination, lab data and technical examinations (tests), or the students have to order tests explicitly based on their current hypotheses (Figure 5). At any stage they can enter a detailed justification of a hypothesis, which allows a much more detailed criticism (Figure 6). In order to simplify the user interface, currently the justification must be stated in terms of the data presented to the user (see Figure 2).

Figure 3 Selection of the current hypotheses based on the information in Figure 2. Again, a dynamic hierarchy is necessary, since the user can choose from among about 200 diagnoses.

In general, the neurology trainer has been well received by those of the 2,000 users who returned a questionnaire, except that it runs too slowly on some of the users’ computers (due insufficient main memory). In particular, the large number of available training cases helped deepen the material of the textbook. There also was a lot of constructive criticism, for example, to enable online access to the textbook, to make the presentation more realistic with multimedia, or to include therapy.

Figure 4 Criticism of the diagnostic hypothesis from Figure 3 (left) and presentation of the physical examination in a currently unopened dynamic hierarchy (right). If all terms are opened, it takes about 3 screen pages similar to the one in Fig. 3. Therefore the technique of dynamic hierarchies is an absolute necessity.

Figure 5 Example for the selection of tests for clarification of a hypotheses and the corresponding criticism of the trainer.

Rheumatology Trainer

The rheumatology trainer deals with the approximately 70 most common rheumatic diagnoses. It was not built specifically for tutorial purposes like the neurology trainer but is an extension of a well-evaluated consultation system for rheumatology (Gappa et al., 1993). In a prospective evaluation with 51 outpatients coming from a second clinic (Schewe & Schreiber, 1993), it stated the final clinical diagnoses (which was used as gold standard) in about 90% of the cases, and it converged exactly on the correct diagnoses in about 80% (i.e. in about 90% of the first mentioned cases). The main modification of the knowledge base for tutorial purposes was adaptation of lab test indications. No diagnostic rules were changed by this modification of strategic knowledge. The rheumatology trainer has been evaluated in two 4-month-courses at a Munich university given by the author of the knowledge base, Dr. Schewe (Schewe, Quak, Reinhardt, & F. Puppe, 1996]. A control group of students was given the same material on paper rather than on a computer. The scenario is as follows:

  1. The students learn the basics of the domain in a conventional manner.
  2. The students can study formalized expert knowledge of the domain.
  3. The teacher presents (real) patients to the students in regular lectures and discusses the cases.
  4. The tutor system presents students the same case in order to repeat the lecture and to get adapted to the tutor system.
  5. The tutor system presents similar cases that might have slightly different diagnoses, which the students have to solve on their own. The similar cases are selected by a case-comparison component (Goos, 1994; F. Puppe & Goos, 1991) from about 1,200 real cases stored with the program.
  6. The students are not interrupted when doing something wrong unless they ask for comments.
  7. The tutor system can follow the students’ actions and criticizes suboptimal performance on demand.

Figure 6 Criticism of the justification of a hypothesis. First, the user has to mark in a dynamic hierarchy like those in Figure 2 all findings he or she considers to be relevant for the hypothesis to be justified. The tutor system compares this justification with that from the knowledge base, which is transformed in order to eliminate the intermediate structure with data abstractions. The hard copy shows that the user hit many of the relevant findings, but also missed a lot.

 

The principal way of interaction is the same as in the neurology trainer, because both systems use the same shell. Two displays illustrating some differences are presented. The first one shows the first phase of presentation and, in particular, the ability of the shell to illustrate findings with pictures (in Figure 7, notice the localization of painful joints). The second display shows the criticism of the user's diagnoses by the system and, in particular, that the case contains multiple diagnoses (Figure 8). This is typical for the rheumatology cases, which represent real (but anonymous) patients. By contrast the Neurology Trainer contains model cases with one diagnosis per case.

Figure 7 Part of the presentation of the history of the case. If an apple is in front of an item, then the user can click on that item and get a picture illustrating the verbal description (here the painful joints). Pathologic data has a coloured background in the dynamic hierarchy.

The Flower Classification Trainer

The flower classification trainer comprises nearly 100 common flowers of the region "Mainfranken" in central Germany. Each flower is characterized by its visible features only, that is, no special instruments are necessary to detect the features. Therefore a classification with a decision tree like in many classification books was impossible. Instead each feature value is rated how strong it is associated with each flower, which makes the classification scheme somewhat redundant and robust. Since most of the features are visual information, the training system was designed in a slightly different way then the neurology and rheumatology trainers discussed previously. Instead of presenting verbal descriptions of

Figure 8 Feedback for multiple diagnoses. Each diagnosis and its rating by the user (left part) is evaluated by the system. By clicking on a diagnosis, the user can enter his or her justification or study the system's evaluation. The last comment indicates that the student missed some diagnoses.

the features, the students get pictures of the flowers and have to enter the verbal descriptions themselves. Figure 9 shows the presentation of a case, here the flower "Wiesen-Pippau (Crepis Biennis)". In addition to the full view of the plant, the user can get various detailed views by clicking around the presentation using a hypermedia system (Reinhardt, 1996). The verbal descriptors of the flower are entered with the standard questionnaire of the shell D3, an example of which is shown in Figure 10. To enhance the identification of the features, most are illustrated with abstract drawings in the questionnaire (similar to the joint localization in the rheumatology trainer). Because the knowledge of which feature can be seen on what region of what picture has been coupled to the knowledge base, the trainer can criticize the descriptive capability of the students. An example is shown in Figure 11, where the system summarizes the correctly and incorrectly recognized features. For the latter, the correct description together with the marked region of the best suited picture is presented on demand.

The flower classification trainer has been tested with volunteer students from an practical flower classification course for biology students at Würzburg university. The students can also enter the name of the flower and get an feedback, similar to the neurology and rheumatology trainers. The trainer was rated by the students as an useful and effective instrument for deepening and testing their knowledge.

Evaluation

There are many aspects for an evaluation of a tutorial system, including the following:

Figure 9 Presentation of a case with a hypermedia system with pictures instead of a verbal description.

A more holistic evaluation concerns the motivation of students to use the system and, ultimately, whether the learning efficiency increases compared to conventional learning techniques. However the latter is difficult to measure, because students usually learn from many sources in parallel and it is difficult to single out the effects of any one of these. On the other hand, the motivation of the students to use the system seems to be quite high according to our current experience based on the tests in all three domains. In the following we evaluate the more concrete aspects stated above.

User Interface for Case Presentation

The user interface with dynamic, user manipulable hierarchies for case presentation was a key to the success of the system and the main difference to the first version of the tutor shell named TUDIS (Poeck & Tins, 1993). In this system, all patient data was presented in one scrollable window. Because a complete case may well contain 100 to 200 pieces of information, it was nearly impossible to maintain an overview on the whole case. The user -manipulable hierarchies have greatly improved the possibilities for orientation because the user can open and close parts depending on the relevance of the findings. Additional options include coloring pathologic findings or excluding nonpathologic findings from presentation.

Figure 10 Questionnaires to be answered by the students on the base of the presentation from
Figure 9.

A general criticism of the user interface in the two medical domains was that the presentation is highly structured, whereas, in reality, a major problem is to recognize the verbal descriptions from static and dynamic pictures, patients’ utterances and so on. In the flower classification trainer, just this was one main goal. However, providing pictures in this domain is relatively easy because nearly all information is on one picture, with some enlargements for critical parts of the plant.

Although it is conceptually quite easy to collect multimedia material for one patient and also possible to enter it in D3, like in the flower classification trainer, it is much more difficult to generate a multimedia presentation of patients from verbal finding descriptions and from correspondence knowledge about the descriptions and multimedia presentations. Another problem is to unify the data entry of Figure 10 with the data presentation of Figure 2 in order to have a compact presentation for the much larger amount of findings in medical domains compared to the flower classification. Similar, the two kinds of criticisms for finding recognition and finding interpretation must be integrated. These are some of the main areas for improving the current version of the tutoring system.

Figure 11 After entering their verbal descriptors the students get a criticism of their recognition capabilities. For each feature, the students can request the correct value and the picture with a marked region, where the feature can be identified best.

The degree of choices for actions left to the students

Because the basic assumption of the tutor system is that students learn best by doing, the amount of choices the students have for solving the case is of key importance. In the simplest version, the students only state their current hypotheses and justify them in terms of a selection of those findings presented to them. In an advanced mode, they can also select tests for clarifying their hypotheses. If those selections would be made from multiple-choice menus with a few items, then they get important hints about the correct solution. The test persons usually acknowledged that they did not get such multiple-choice questions but had to select from the complete hierarchy of options represented in the system. The user-manipulable hierarchies were generally accepted as a means to quickly find the correct item, because the contents of the higher categories in the hierarchies greatly helped in navigation. A highly redundant heterarchy, where multiple paths lead to one particular diagnosis is therefore advantageous, and the knowledge base was extended accordingly.

A general criticism of the medical training programs concerned the granularity for requesting data about the patient. In particular for history taking and physical examination, it is not sufficient to request simply a leading finding, but there are many different aspects of such leading findings that must be checked one by one. In the multimedia mode for the flower classification trainer, the situation is much more realistic because the students had to know where to look for relevant information. Again it demonstrates the power of a multimedia representation of cases.

The kind of criticism of these actions

For criticizing diagnostic hypotheses and test choices, the system not only evaluates whether the selected actions are on its own list but also checks how it has rated the students’ hypotheses and tests based on the current amount of information known to them. Much more problematic is the criticism of the justification of these actions. Currently, justifications can only be stated in terms of the presented data. However, the knowledge base may contain a lot of intermediate conclusions like data abstractions and intermediate diagnoses, which are necessary to form precise rules. These intermediate conclusions are compiled out for the purpose of criticizing the justifications. Although it would be clearly advantageous to allow justifications with intermediate conclusions, there are two problems: First, a user interface requiring an additional user manipulable hierarchy of intermediate conclusions from which the students can select and, second, some flexibility in computing the criticism, because the students have now the choice to state a connection between a finding and a diagnosis in different ways depending on the degree they are using intermediate terms.

The content of criticism based on the knowledge base

A general problem for criticizing students is that the system must assume that its knowledge base is perfect, whereas in reality this is never possible. In particular, negative knowledge of why a certain hypothesis is very unprobable at the moment is often missing. However, there are also more concrete problems with complex rules. If only parts of the precondition of such rules are fulfilled, then the rule is not applicable for the system, although the students may not see such sharp boundaries. In part, this problem can be improved by using m-from-n rules with a minimal and maximal cardinality, but still there remains a boundary-effect. Currently, the system sometimes seems to be overcritical, for example, when there is no evidence for a particular diagnosis, because none of the rules are applicable, although there is only a near miss. However, this is not a principal problem like the "perfect knowledge base assumption", but can be improved by a metric specially designed for the purpose of criticism for near misses.

The quality and variety of the cases to be presented

As already stated, a big advantage of the knowledge-based approach to pure hypertext-based tutorial systems is that without much effort a great variety of cases can be created simply by entering data of the case with a convenient interviewer component. The main restriction is that the knowledge base must be good enough to solve these cases correctly. Although there are no problems with model cases, real patient cases with multiple disorders are often ambiguous and require more sophisticated knowledge. Therefore, the limits of the training system are ultimately the limits of the underlying diagnostic problem-solving capability of the knowledge base.

Discussion and Outlook

In 1992 an article in the German journal for physicians Deutsche Ärzteblatt stated that the authors "expected, that in the next 5 to 10 years the knowledge of a standard textbook in medicine is transformed in an expert system, the textbook is complemented by a handbook with a diagnostic table and above all a tutorial program is available to students on personal computers" (F. Puppe et al., 1992, p.1253, translated). Inspired by this article, the neurology trainer available by voucher included in a standard neurology textbook (Poeck, 1994) has made this prognosis true (concerning the diagnostic part of the textbook) and even leaves some time for improvements. Although the other applications in rheumatology and flower classification were not built according to a particular textbook they hold a similar promise. All systems have been well received by external test users. Further projects in various medical and non-medical domains are in advanced stages.

Besides continual improvements of the knowledge bases and the criticism component, the main extension will be the generation of multimedial case presentations from a knowledge base, case descriptions and multimedia material, and an integrated criticism of the interpretation and the correct recognition of the findings. Another task is the introduction of the intermediate diagnostic conclusions to the user, because they constitute the technical jargon that must be learned as well by the students. Because the technical jargon - at least in medical domains - is not very well standardized, this requires - beside an extension of the user interface - that the students already be acquainted to the jargon making the combination with a textbook even more attractive. Although the current training systems do not store any information about the students from one session to the next, such knowledge can be used, for example for selecting cases particularly useful for them. The context of previous cases might also be useful for a Socratic dialog (Collins, 1977), where similarities of the actual to former cases already solved by the students are demonstrated in order to help them generalize knowledge by themselves. Another way to help the students to memorize knowledge is the addition of informal support knowledge, in particular causal metaphors (the exact causal process may be too detailed for this purpose). Our future work aims at incorpating the suggested improvements into the shell.

References

Berner, E., Webster, G., Shugerman, A., Jackson, J., Algina, J., Ball, E., Cobbs, G., Dennis V., Frenkel, E., Hudson, L., Mancall, E., Rackley, C., & Taunton, D. (1994) : Performance of four computer based diagnostic systems. New England Journal of Medicine, 333, 1792-1796.

Buchanan, B., & Shortliffe, E. (Eds.). (1984). Rule-based expert systems: The MYCIN experiments. Reading, MA: Addison Wesley.

Clancey, W. (1985). Heuristic classification. Artificial Intelligence, 20, 215-251.

Clancey, W. (1987). Knowledge-based tutoring: The GUIDON-program. Cambridge, MA: MIT Press.

Collins, A. (1977). Processes in acquiring knowledge. In R. Anderson, R.J. Spiro, & W. Montague (Eds.), Schooling and the acquisition of knowledge. Hillsday, NJ: Lawrence Erlbaum Associates, Inc.

Eysenbach, G. (1994). Computer-manual. Munich: Urban&Schwarzenberg.

Fontaine, D., Beux, P., Riou, C., & Jacquelinet, C. (1994). An intelligent computer-assisted instruction system for clinical case teaching. Methods of Information in Medicine, 33, 433-445.

Gappa, U., Puppe, F., & Schewe, S. (1993). Graphical knowledge acquisition for medical diagnostic expert systems. Artificial Intelligence in Medicine, 5, 185-211.

Goos, K. (1994). Preselection strategies for case based classification. In B. Nebel, & L. Dreschler-Fischer (Eds.), Proceedings of the KI-94 (LNAI 861, 28-38). Berlin: Springer.

Inui, M. (1991). Fundamental research on simulation based intelligent tutoring systems. Proceedings of the World Congress on Expert Systems, Vol. 1 (pp. 329-336). London: Pergamon Press.

Lincoln, M. (1989). ILIAD training enhances students’ diagnostic skills. Journal of Medical Systems, 15, 93-110.

Miller, R., & Masarie, F. (1989). Use of the quick medical reference (QMR) program as a tool for medical education. Methods of Information in Medicine, 28, 340-345.

Poeck, K. (1994). Neurologie [Neurology] (9th ed.). Berlin: Springer.

Poeck, K., & Tins, M. (1993). An intelligent tutoring system for classification problem solving. In H.J. Ohlbach (Ed.), Proceedings of the GWAI-92 (pp. 210-220). Berlin: Springer, IFB.

Puppe, B., Ohmann, C., Goos, K., Puppe, F., & Mootz, O.(1995). Evaluating four diagnostic methods with acute abdominal pain cases. Methods of Information in Medicine, 34, 361-368.

Puppe, B., & Puppe, F. (1988). A knowledge representation concept facilitating construction and maintenance of large knowledge bases. Methods of Information in Medicine, 27, 10-16.

Puppe F. (1992). Intelligente Tutorsysteme [Intelligent tutoring sytems]. Informatik Spektrum, 14, 195-207.

Puppe, F., & Goos, K. (1991). Improving case based classification with expert knowledge. In T. Christaller (Ed.), Proceedings of the GWAI-91, 285, (pp. 196-205). Berlin: Springer, IFB.

Puppe, F., Gappa, U., Poeck, K., & Bamberger, S. (1996). Wissensbasierte Diagnose- und Informationssysteme [Knowledge-based diagnosis and information systems]. Berlin: Springer.

Puppe, F., Puppe, B., & Gross, R. (1992) Lehrbuch/Expertensystem-Kombination für die medizinische Ausbildung [Textbook/expert system combination for medical education]. Deutsches Ärzteblatt, 89 (14), 1247-1253.

Reinhardt, B. (1996). Expert systems and hypertext for teaching diagnostics. Proceedings of the European Conference on Artificial Intelligence in Education. Lisbon.

Schewe, S., & Schreiber, M. (1993). Stepwise development of clinical expert system in rheumatology. The Clinical Investigator, 71, 139-144.

Schewe, S., Quak, T., Reinhardt, B., Puppe, F. (1996), Evaluation of a knowledge-based tutorial program in rheumatology: A part of a mandatory course in internal medicine. In C. Frasson, G. Gauthier, & A. Lesgold (Eds.), Proceedings of the third international Conference on Intelligent Tutoring Systems in Montreal (pp. 531-539). Berlin: Springer.

Wenger, E. (1987). Artificial intelligence and tutoring systems. Los Altos, CA:Morgan Kaufman.

Woolf, B. (1988). Intelligent tutoring systems: A survey. In H. Shrobe (Ed.), Exploring artificial intelligence (pp. 1-44). Los Altos, CA: Morgan Kaufman.

Woolf, B. (1992). AI in education, In S. Shapiro (Ed.), Encyclopedia of artificial intelligence (2nd ed., Vol. 1, pp 434-444). New York: Wiley.