Proceeding to ITS 96

Intelligent Tutoring Systems 98 in Montreal

Evaluation of a Knowledge-Based Tutorial Program in Rheumatology - a Part of a Mandatory Course in Internal Medicine

Stefan Schewe1, Thomas Quak1, Bettina Reinhardt2, Frank Puppe2

1) Medizinische Poliklinik der Universität München, Pettenkoferstr. 8A, D-80336 Munich, e-mail:

2) Lehrstuhl Informatik VI der Universität Würzburg, Allesgrundweg 12, D-97218 Gerbrunn, e-mail: reinhardt/


A serious limitation of medical tutoring software in Germany is the lacking integration in the university education routine. In this study, we propose a scenario for such an integration and discuss results of an evaluation. The scenario is to complement lectures with patient presentations by a training system showing the same cases in a formalized form, so that the students can consolidate their newly acquired knowledge. Then, the students may test themselves by solving a symptomatically similar case with a potentially different diagnosis while being criticized by the training system. Technical prerequisites for this scenario are a knowledge based training system for solving the newly entered cases from the lectures and a case-comparison component for finding symptomatically similar cases from a large case base. They are provided by the diagnostic shell box D3.

1. Introduction

While medical training programs are broadly available they are currently not very well integrated in the education curricula of universities. For example, real patient presentations in lectures are quite popular but most case based training programs are unable to adapt themselves to these particular patient data. In the following, we present a scenario in which the same patient data is both presented in real life and in a computer training system. This scenario has been evaluated in two successive rheumatology courses at the University of Munich with very encouraging results.

Section 2 and 3 describe the technical and the organisational part of the evaluation environment. Section 4 and 5 present and discuss the results.

2. The tutoring shell

Tutoring systems for medical education have become quite popular (see e.g. [Eysenbach 94]). While many of them are based on the hypertext / hypermedia technique consisting of links between predefined windows, the idea of intelligent tutor systems (e.g. [Wenger 87]) is to generate the presentation of the tought subject of the underlying domain as well as didactic knowledge. For example, case oriented tutoring systems can be designed in both ways: The patient case can be presented with a hypermedia system in which the sequence and the contents of the windows are prepared for this particular case. Another approach is building a knowledge base capable of solving cases and using real cases for tutorial purposes. While the costs of building hypermedia based training systems are directly proportional to the number of cases included, knowledge based training systems require a large initial effort to build and test the knowledge base but afterwards only minimal financial means for the addition of any number of new cases. The first system exploring this approach was GUIDON [Clancey 87], a tutoring system based on the expert system MYCIN. Insights gained from this work are that a general problem solving method like the backward chaining of rules in MYCIN severely restricts the explainability of the program for tutorial purposes and that an unstructured rule format makes it difficult for the students to differentiate the key clause in the rule precondition from the context and activation clauses. Commercially available knowledge based training systems are the tutor versions of ILIAD [Lincoln et al. 92] and QMR [Miller & Masarie 89]. They avoid the problems of GUIDON/MYCIN by a much simpler knowledge representation.

The general architecture of knowledge based tutorial systems is quite obvious: In addition to the basic components of expert systems - including a knowledge base and a problem solver, a knowledge acquisition as well as an explanation and interviewer component - specific tutorial features are case presentation and components for criticism (see e.g. [Fontaine et al. 94]).

A special feature of our shell D3 is the integration of different problem solving methods, in particular by resorting to heuristic and case-based knowledge [Puppe et al. 94]. Developing a new training system with the D3 requires the construction of a knowledge base and the addition of cases with the interviewer component. An attractive feature is that the author of the lectures may build or modify the knowledge base with a convenient graphical knowledge acquisition facility [Gappa et al. 93]. The knowledge base contains sufficient knowledge about the hierarchical or heterarchical structure of the findings to ensure an automatic generation of (textual) case presentations in several modes ranging from detailed presentations of findings to a concentration on the key diagnostic elements for a better control.

At the beginning of a new case the training system presents initial data about the patient. The user then selects tests to investigate his or her hypotheses. On demand the system provides comments on the users' actions and can also criticise his or her justifications. Criticism of hypotheses is generated by comparing them with hypotheses inferred by the system using the same data the user has interpreted so far. Criticising the users´ justifications is more difficult because the system bases its conclusions on intermediate (pathophysiological) concepts derived from the raw data. The user is unaware of these intermediate concepts which have to be compiled out to assess how much individual raw findings support the hypotheses. The criticism of the test choices is easily generated since the explicit representation of that knowledge is defined in the knowledge base. For each diagnosis a sequence of useful tests is specified. When the user selects a test the system compares it with its own choices (also considering second rate choices) based on the suspected diagnoses in the present stage of the tutorial session. The user can also justify a test selection referring to his or her suspected hypotheses criticised by the system with respect to the correctness of the suspected hypotheses and to how well they can be investigated by the selected tests. More details concerning the tutoring component of D3 can be found in [Reinhardt & Schewe 95] and [Puppe & Reinhardt 95].

3. The evaluation scenario

The learning environment

The area of rheumatology is partly covered by a mandatory curriculum for the students of Internal Medicine. The rheumatology class is divided into various groups including two or three persons and is held on a regular basis both on Tuesdays and Wednesdays with a duration of 90 minutes each day.

First, the students are given 30 minutes to find out the medical history of a new patient and to implement the most important medical examinations. After that, the professor thoroughly discusses the case with the student while the patient is still present: further questions relevant for the symptoms of the disease are asked and the essential results of the medical examination demonstrated. Moreover, student and professor consider the future diagnostic measures, delineate the options concerning the differential diagnosis and, presupposing the diagnosis, confer on a therapy.

Each individual group of students is assigned to the rheumatology section for two subsequent weeks. On Tuesdays the students are confronted with a rheumatological case and its discussion in the manner mentioned above whereas on Wednesdays they have to deal with a tutorial computer presentation of the Tuesday case for beginners (Beg.Comp). In addition to that, however, one of 1017 rheumatological cases contained in a data base with a similar medical history but a potentially different diagnosis is presented. The similar case has to be solved by the students. The following week, the same procedure is gone through with a new patient.

Beginning with the winter term in 1994/95 a change in the Wednesday version has been made. Instead of using the tutorial computer program each Wednesday, both the presentation of the Tuesday patient and the corresponding data base case were presented to the students in the form of a structurally identical text version providing exactly the same information as the computer program.

In the summer term of 1995 a newly developed advanced version of the computer program (Adv.Comp) has been tested in the second week of the class whereas in the first the former version for beginners has been applied.

The tutorial program

The program combines both the tutorial environment of the shell box D3 [Gappa et al 93] and the knowledge base of the expert system "RHEUMA" which has been evaluated on numerous patients [Schewe 93].

The computer version for beginners (Beg.Comp) provides the information en bloc in a very coherent manner. The questions and the patient´s answers with respect to the medical history are followed by the clinical examination, the laboratory results and finally all technical and miscellaneous proceedings. Analogous to the physician´s measures, the students are supposed to decide on a diagnosis at the end of each section. Thereby they can make recourse to the hierarchical order of the diseases and merely specify a global diagnosis such as inflammatory rheumatological disease, spondylarthropathy etc. or choose particular diagnoses defined within this diagnostic hierarchy (figure 1), with a global degree of probability (suspected or confirmed). The computer system now criticises the student´s choice by comparing it to the solution of the knowledge-based system assuming an equal level of information. The student always has the opportunity to call explanations of terms and concepts as well as justifications of the knowledge-based rules. Also, the student can explain his or her own diagnosis to the computer.

Just like in the beginner version the advanced version (Adv.Comp) of the tutorial computer program supplies the preliminary information on the medical history systematically. Like the physician in his future practise the student can freely choose a wide range of possible laboratory, radiological and other examinations those which are considered to be indispensable for the actual case. After that the student again specifies a diagnosis which is also assessed by the computer system. This process of selecting relevant results can be repeated several times until a diagnosis of a specific disease is possible.

The text version encompasses the information provided in the computer version for beginners and consists of a printout of both the known and the unknown similar case. The students also have to try to find a diagnosis after each individual bloc.

Figure 1: The student chooses one more diagnosis and gets a friendly feedback, because he had selected at least one diagnosis in the right category. The use of the dynamic hierarchy is necessary to choose from over 70 diagnoses.

The patients

The patients who volunteer to answer the questions and undergo the students´ examinations are usually ambulatory. Most of them are new to the policlinic and have been referred by general practitioners and hospitals of the area.

The professors, the students

18 professors participate in the Internal Medicine curriculum, partly they are assistant physicians and partly specialists for Internal Medicine or senior physicians. One physician is responsible for the field of rheumatology.

All the students are in the eighth or nineth clinical term of medical school. The Internal Medicine curriculum is a mandatory prerequisite held at five Internal University clinics. The students can not choose a course of their choice.

The parameters for evaluation

The students´ opinions are obtained by a questionnaire on each individual day of class. The students assess the quality of the course in Internal Medicine by grading it according to a scale from 1 (does not apply) to 10 (applies) considering their motivation, the organisation of the course, the efficiency, the professor´s achievement as well as their general impression. The questionnaire is anonymous for the students, the professor can freely decide on whether he wants to appear on it namely and be notified of its results towards the end of the term.

In the class on rheumatology the students complete a multiple choice test concerning the disease of the patient presented each week before and after the two days in the course, as well as a 50 item questionnaire after working with the whole computer system or text version in class. The achievement exam concluding the term is also based on multiple choice. Containing questions on Internal Medicine one additional internal and one rheumatological case are described together with brief information on the medical history. The clinical, laboratory and technical examinations necessary to diagnose the disease have to be stated by the students.


The statistical analysis used the Student's t-test to compare normal distributed, independent means, the 95% confidence intervals were determined using the same method. The differences of dependent values were tested with Students' t-test or Wilcoxon signed rank test for paired samples depending on the distribution of the data. All statistical tests were two tailed, and a P value of 0.05 or less was taken to indicate statistical significance.

4. Results

In the winter term of 1994/95 there were 22 and in the summer term of 1995 19 students attending the rheuma class. In the curriculum on Internal Medicine (it was only evaluated in the summer term of 1995) 28 students participated, for them 562 student hours were held by 11 professors. 405 hours were evaluated by the students (a response rate of 72.1%), in 22.5% of which suggestions for improvement were made by the students and in 37.5% of which the professors signed.

The opinion of the students in general was very positive (mean value 95% confidence interval):

ªToday´s class was motivating to me!´ (motivation): 8.064 0.165 points,

ªIt seems like I have learnt a lot!´ (efficiency): 8.074 0.170 points,

ªThe prof. made a considerable effort for my education!´ (prof) 8.402 0.163 points,

ªThe course was well organised!´ (organisation): 8.259 0.189 points,

No significant differences in the assessment between the professor of rheumatology and other professors as opposed to considerable differences between individual professors in several categories could be observed. Professors who did not choose to stay anonymous received remarkably better grades than professors who did.

Considering the questionnaires of 41 students (22 students in the winter term of 1994/95 and 19 in the summer term of 1995), the following results with respect to the students´ motivation (excerpt of the results of the 50 item questionnaire, mean value standard deviation, minimum, maximum) could be evaluated.

Question: ªDo you think the employment of tutorial computer systems in your studies makes sense´: 7.95 1.72; min 2, max 10

Question: ªAre you having fun with the computer program ?´: 7.03 2.06; min 2, max 10

Question: ªDo you think that you can effectively study with the program?´: 6.93 2.30; min 2, max 10

The program was assessed by the students as follows:

Statement: ªThe program is able to:´

ªreplace a professor ´ 1.82 1.61; min 1, max 8,

ªsupport a professor ´ 6.87 2.24; min 1, max 10,

ªimprove my diagnostic skills´ 6.82 2.15; min 1, max 10,

ªsupport independent study´ 7.61 2.40; min 2, max 10.

The students clearly favor an independent study:

Question: ªWhere would you most frequently use a computerprogram (asume that it is accessible)?´

ªcomputer at the university, independently´ 5.63 2.73; min 1, max 10

ªcomputer at home, independently´ 7.50 2.67; min 1, max 10

ªcomputer with other students´ 3.74 2.70; min 1, max 10

ªreplacement for practical courses´ 1.44 1.18; min 1, max 7

ªsupplement for practical courses´ 7.05 2.78; min 1, max 10

The following results were obtained concerning the students´ assessment of how their knowledge improved (see figure 2): The students considered their knowledge of rheumatology to be at a level of 3.86 1.83 points on a scale of 1 (no knowledge of rheumatology at all) to 10 (good knowledge of rheumatology) when the winter term started. At the beginning of the summer term the values did not show a significant ifference (3.11 1.15). After the two day class, however, the students´ self-assessment ranked at 6.09 1.04 in the winter term and at 5.16 1.26 in the summer term which again doesn´t manifest a significant difference between the two terms but proves a highly significant improvement in how the students estimated their knowledge for both terms. After the tenth day the students considered their rheumatological knowledge to have decreased. The value of 3.05 1.36 for the summer term was a lot lower than the value of 4.41 1.30 for the winter term, a significant difference between the respective pre-test value was not obvious. The regular course together with the text version of the system showed an increase in the students´ assessment of their knowledge from 4.41 1.30 to 6.14 1.25 points whereas in the summer term a significantly greater increase from 3.05 1.36 to 6.09 1.04 could be achieved by the regular course and the advanced version together. All the teaching efforts - according to the students - displayed a temporary highly significant increase in the students´ estimated knowledge, after one week, however, it reached the basic pre-test values again (figure 2).

Figure 2: Students´ assessment of their own rheumatological knowledge on an analogous scale of 1 (no knowledge of rheumatology) to 10 (good knowledge of rheumatology), representation of mean values.

In parallel to these results significant increases of the student's knowledge objectively measured in MC tests could be observed. Again the student's knowledge decreased within one week to the pretest level.

The clinical case presented to the students within the written knowledge test after the two day class implied a short delineation of a patient´s medical history similar to the one actually discussed in class. The students had to specify in a free way which additional questions on the medical history they would ask as a physician and which clinical and technical examinations as well as laboratory tests they would require. The answers given by the students were summed up and criticized. Again considerable increases of correct answers could be observed after each 2 day course. But now the decreases of correct answers after one week were only slight and differed significantly form the pretest values. The summer results showed slower increases of knowledge compared to the winter results concerning a lack of motivation the students´ might have experienced due to the external circumstances. Most summer students did not list the possible examinations completely, so that all specifications - with the exception of the ones concerning the technical examinations - increased significantly during the winter term whereas in the summer term only the enumerated questions on the patient´s medical history improved.

The free exam at the end of the term demonstrated no differences in correct answers between a comparatively ease general internal case (angina pectoris in winter, pneumonia in summer term) and a more difficult rheuatological case (sarcoidosis in winter, spondylitis ancylosans in summer term). Better results could be observed with questions concerning clinical problems (medical history, clinical examination) compared to worse results concerning technical exams. So the aim of the education could be achieved.

5. Discussion

Similar to the evaluation results of other tutor systems ([Preiss et al 92]), students in an advanced stage of their medical formation hold an overall very positive view on the ªRHEUMA´ tutor system. They are motivated to work with it and clearly render advantages and areas in which its application makes sense. Thereby it becomes obvious that the use of a computer system for independent study is most frequently welcomed by the students whereas they generally reject all efforts to integrate this tutoring device in universities and clinics without making it part of the regular program. A tutor system can neither substitute a professor nor specific classes, it rather serves as a motivating factor for students to use their books [Lilienfeld et al 94]. This assumption is substantiated by the fact that in the final exam a difficult rheumatological case discussed only with few students during the class is solved with the same results as an easy clinical case of Internal Medicine belonging to the daily routine of the physician. The tutoring system as a replacement for textbooks, however, has not been tested in our study. The equal assessment of the Rheumatology professor by the students was supposed to show that the positive evaluation was not only an expression of the professor´s achievement but a demonstration of how they regarded the complete scenario.

An interpretation of the results of the objective knowledge tests poses a greater challenge. The students´ self-assessment as well as the objective multiple choice test on the students´ rheumatological education clearly show that there exists a temporary increase in their knowledge which is relatively independent of the computer system used during the course (Adv.Comp or Beg.Comp) with slightly better results in the mean for the advanced version. Similar results were also obtained by [Preiss et al 92]. Even though the students´ knowledge again considerably decreases within a week it is reactivated by the repetition of the whole course with a new emphasis in the field of rheumatology. In the self-assessment of their knowledge by the students the advanced version of the computer system ranked highest even though the difference to the text version was insignificant. The increase in the students´ knowledge throughout a period of 12 days happens wave-like but nevertheless in a continuous manner. The final exam reveals the situation of a student: Being able to achieve the same results in the specialized subject of rheumatology (which at least in Germany is often considered not so important) as in a standard field of Internal Medicine frequently practiced and generally known to the students such as cardiology (angina pectoris) or infectious diseases (tuberculosis) must be very satisfying.

Of course the students also detected weaknesses of the program which shouldn´t be concealed. Points of criticism were the still missing multimedia-illustration as well as the explanations of the system and especially a lack of relevance for their future daily practical work. Either the students underestimate the meaning of rheumatology for their future work as a physician or they have the opinion that skills needed for the recognition and treatment of rheumatological diseases can´t be acquired enough by the computer system. The second point is contradicted by the experience of the professor in the rheuma course. The fact that the tutoring system ªRHEUMA´ is a mere device for diagnostic support and doesn´t deal with individual therapies or therapeutic strategies might also play a role.

In contrast to a great variety of medical tutoring programs ([D´Alessandro et al 93], [Lilienfeld et al 94], [Preiss et al 92]), ªRHEUMA´ has obvious advantages, a fact that has to be mentioned. The system exclusively presents everyday clinical cases adapted for physicians willing to improve their rheumatological knowledge. Right now the data base consists of 1017 cases of patients with joint complaints diagnosed by the computer system ªRHEUMA´, a treasure chest of examples which can be individually chosen to illustrate specific rheumatological diseases. The system does not use artificial patients which have been created for educational purposes and whose profiles are surcharged with numerous aspects aiming at theoretical traits of a disease.

Beside the results of this study, the following aspects have to be considered to achieve a further development of the system: It has to be used on a regular basis by the students so that they get accustomed to it, that they are able to criticise it and that the program can be adjusted according to their needs. The knowledge base developed for the diagnostic support of a non-rheumatologist has to be altered for educational means: international standards - as far as they have been defined - as well as other diagnostic criteria have to be introduced to the students explicitly. Considerable additions have to be made to the user interface by implementing new areas of explanations, multi-media use in graphics and sound as well as references to standard literature. Another logical development, that has to be done in future works is to integrate a simulation system, providing the ability to criticize the therapy of a patient. This addition can be made similar to the cardiac tutor [Eliot and Woolf 95], that also has a good developed student model, what is another missing part of the used trainer shell.

The program thus provides a means of support of the professors´ lectures. It can be effectively used within medical formation, a certain circle of quality in the improvement of the rheumatological care has to be aimed at. The factors of cost and efficiency in diagnostic measures will have to be considered more often in the future while not only the costs have to be calculated but also an optimal and rational diagnostic path will have to be followed (ªcoaching´, [Burton and Brown 82]). With a combination of textbook and personal computer the horizon of students as well as physicians in the field of rheumatology can be considerably widened, both groups of learners, however, have to be determined to acquire patterns for the solution of exemplary clinical cases at an individual learning pace as well as on an own initiative.


Burton A, Brown J.: An Investigation of Computer Coaching for Informal Learning Activities. In Sleeman D., Brown J. (Eds) Intelligent Tutoring Systems, Academic Press, London, 1982.

Clancey, W.: Knowledge-Based Tutoring: The GUIDON-Program, MIT Press, 1987.

Cutts JH et al: a graphics assisted learning environment for computer-based interactive videodisc education. Int J Biomed Comput (England) 31, 141-5, 1992.

D'Alessandro MP et al: The instructional effectiveness of a radiology multimedia textbook (HyperLung) versus a standard lecture. Invest Radiol (U S) 28, 643-8, 1993.

Eliot C and Woolf BP: An Adaptive Student Centered Curriculum for an Intelligent Training System. User Modelling and User-Adapted Interaction 5, 67-86, 1995.

Eysenbach G: Computer-Manual, Urban&Schwarzenberg, 1994.

Fontaine D et al: An Intelligent Computer-Assisted Instruction System for Clinical Case Teaching, Meth. Inform. Med. 33, 433-445, 1994.

Gappa, U., Puppe, F., and Schewe, S.: Graphical Knowledge Acquisition for Medical Diagnostic Expert Systems, Artificial Intelligence in Medicine 5, 185-211, 1993.

Lilienfield LS, Broering NC: Computers as teachers: learning from animations. Am J Physiol (United States), 266, 47-54, 1994.

Lincoln MJ et al: Ilias's role in the generalization of learning across a medical domain. Proc Annu Symp Comput Appl Med Care (United States), 174-8, 1992.

Miller, R. and Masarie, F.: Use of the Quick Medical Reference (QMR) program as a tool for medical education, Meth. Inform. Med. 28, 340-5, 1989.

Nashel DJ, Martin JJ: Images in Rheumatology: a multimedia program for medical education. Proc Annu Symp Comput Appl Med Care (United States), 798-9, 1992.

Preiss B et al: Graphic summaries of expert knowledge for the medical curriculum: an experiment in second-year nephrology. Methods Inf Med (Germ.) 31, 303-9, 1992.

Puppe, F., Poeck, K., Gappa, U., Bamberger, S., & Goos, K.: Reusable Components for a configurable Diagnostics Shell, (Germ.), KI 2/1994, 13—18, 1994.

Puppe, F. and Reinhardt, B.: Generating Case-Oriented Training from Diagnostic Expert Systems, to appear in Machine Mediated Learing, 1995.

Reinhardt, B. and Schewe S.: A shell for intelligent tutoring systems, Proceedings of AI-ED-95

Schewe S., Schreiber M.A.: Stepwise Development of a Clinical Expert System in Rheumatology. Clin. Investig 71, 139 - 144, 1993.

Wenger, E.: Artificial Intelligence and Tutoring Systems, Morgan Kaufman, 1987.

Widmer G, Horn W, Nagele B: Automatic knowledge base refinement: learning from examples and deep knowledge in rheumatology. Artif Intell Med (Netherlands) 5, 225-43, 1993.