Formative Evaluation:

What, Why, When, and How


This paper is going to describe the formative evaluation by comparing two books of Tessmer and Flagg. One can employ Tessmer stages of doing formative evaluation (experts reviews, one-to-one, small group, and field test) in instructional design or Flaggs’ steps (pre-production, production, and implementation) in instructional technologies. This paper recommends Tessmer’s book to be used for the beginners because its detail explanation of how to do, plan, collect data, and analyze it.


  1. Background

Having no methods of its own, evaluation has always borrowed strategies from all the social sciences (Krathwohl, 1997). Evaluation can evaluate almost anything, such as a person, a curriculum, a student, a process, a product, or a program. Every evaluation conducted by some experts has its own name such as personal evaluation, program evaluation, program evaluation, or teaching evaluation. Evaluation also differs from research not by its methods but by other aspects, such as decision-driven (to failitate making decision) and utilization (usefulness of the process). The intent of evaluation is to reduce the uncertainty and to provide an imformation-rich decision making environment. Comparing to research, evaluation gives a better information basis for action (decision).

Evaluation is variously conceived as a tool for more effective program management or means of empowerment for those affected programs, or an effort to be responsive to the concerns of stakeholders, or directed by a highly competent professional opinion, or a means to conclusions and recommendation, or a process of negotiation among stakeholders to produce an agenda for further negotiation. Some experts see evaluation as the responsibility of connoissearial judgements by an areas’ experts, best be done as naturalistic descriptive qualitative research or embedded in measurement and experimentation (Krathwohl, 1997).

Evaluation is oriented primarily toward gathering information that will facilitate improving a person, a curriculum, a student, a process, a program or a product (formative) or that will help determining its value or worth (summative). Many experts have analyzed the difference between formative and summative evaluation. Markle (1989, in Tessmer, 1993) mentioned that summative evaluation is an evaluation to prove but formative evaluation is an evaluation to improve the programs or the products. Baker and Alkin (1973, in Tessmer, 1993) stated that the difference between the two evaluations as between evaluation for validation (summative) versus evaluation for revision (formative). Schriven (1991, in Krathwohl, 1997) quoted Robert Stake on the formative and summative distinction, "When the cook tastes the soup, that is formative evaluation; when the guest tastes it, that is summative evaluation".

This article is going to describe thoroughly formative evaluation as a tool of improving instructional programs, products, and materials. By comparing two books of formative evaluation, i.e. "Planning and conducting formative evaluation" by Martin Tessmer (1993) and "Formative evaluation for educational technologies" by Barbara N. Flagg (1990), this paper will summarize and analyze the following questions:

  1. What is formative evaluation?
  2. When is evaluation conducted, why and why not?
  3. What stages of formative evaluation are there?
  4. How to plan formative evaluation?
  5. What are the remarks and critical issues in conducting formative evaluation?

This paper will give recommendation for using the two formative evaluation books. This article will focus on answering those questions and describe the strengths and weaknesses by analyzing the books critically and by giving opinions and reasons whether the books valuable or useable.

  1. Definition
  2. Tessmer (1993) explicitly defined formative evaluation as a judgement (of the strengths and weaknesses of instruction in its developing stages) for purposes of revising the instruction to improve its effectiveness and appeal. Defining solely that evaluation is a process of gathering data to determine the worth or value of instruction, of its strengths and weaknesses. The evaluation is conducted by collecting data about the instruction from variety of sources, using a variety of data gathering methods and tools. The readers should understand that the process of gathering data is very important since the formative evaluation is a judgement for improving the effectiveness of the instruction (products, programs, or materials).

    Flagg (1990) does not give an explicit definition of formative evaluation. She refers formative evaluation as the process of gathering information to advise design, production, and implementation. Discussing three case studies ("Sesame Street", "The Business Disc", and "Puppet Theater"), the book explained the phases (design, production, and implementation) of formative evaluation in each case. Explicitly the book also mentioned that formative evaluation is valuable in decision-making process during the design of computer software and videodisc, is working in production settings, and is facilitating decision in implementation process.

    These two definitions describe that formative evaluation is a process of collecting data to be used to judge the strengths and weaknesses of instructional in order to revise and improve the programs, products, and materials. This judgement is a guideline for the researcher to improve the quality, effectiveness and efficiency of the programs, products, and materials. It also can be used to make decision whether the programs, products, and materials should be continued or cancelled, revised or changed, improved or destroyed. Both books consider formative evaluation as an important step and one should understand that continuation of the programs, products, and materials depends on the result of formative evaluation.

    Scriven (1967, in Tessmer, 1993 and Flagg, 1990) attached the name "formative evaluation" to a revision process that referred to an outcome evaluation of an intermediate stage in the development of the teaching instrument. Using the same name given by Scriven, Tessmer (1993) mentioned other names of formative evaluation from other experts, i.e. "try out", "developmental testing", "pilot study", "formative assessment", "dry run", "alpha/beta testing", "quality control", and "learner verification and revision". Tessmer preferred to use "quality control" as formative evaluation but since "quality control" does not describe and represent the actual meaning of formative evaluation and the actual target who is going to judge the effectiveness and quality of the products, "learner verification and revision" is a better name for formative evaluation. But what is in a name? For formative evaluation, the process of conducting it is the most important thing to be planned and conducted thoroughly.

    Flagg (1990) gave no specific name for formative evaluation and mentioned no reason why the name is included in each phase of evaluation, such as pre-production formative evaluation, production formative evaluation, and implementation formative evaluation. The names are referred to the collection of information to guide decisions during the design, production, and implementation phase respectively.

  3. When, why and why not
  4. Flagg (1990) mentioned the only reason for performing formative evaluation is that to inform the decision-making process during the design, production and implementation process. To understand the content, attitudes toward the content, interests in the content, and learners’ experience with the medium in design phase, to reduce expensive mistakes and improve user friendliness in production phase, to restructure the products for different settings in implementation phase are the main reasons why formative evaluation needed. But particularly, formative evaluation is warranted for the novice designers, for the implementation of new content, for the application of the new technologies, for the different target learners, for the unfamiliar strategies, for the accurateness of critical performance, for the large quantity of dissemination, and for the little chance of revisions given (Tessmer, 1993).

    Considering the importance of formative evaluation and by analyzing several studies, Tessmer stated that there were three main results of doing formative evaluation in instruction. First, using formative evaluation in all types of instruction (computer-based, simulation, games, texts, and multi media) can improve the learning effectiveness of materials. Second, even though there is not enough evidence of whether the instruction is more interesting or motivating, formative evaluation can be used to obtain criticism and suggestions on interest or motivation of instruction to its users. Third, since practitioners used some types of formative evaluation (no specific names) in their projects, formative evaluation appears to be part of the "real world" of instructional design.

    By considering and explaining three case studies, Flagg (1990) perform the need of formative evaluation to inform the decision-making process during their design, production, and implementation stages of educational program with the purpose of improving the program. The Sesame Street staff conduct research on children’s conceptual understanding of death in order to provide information useful in scripting the program. In production phase, user observation gave producers of The Business Disc feedback to improve user friendliness before the instructional program reached a stage where changes were cost prohibitive. Formative evaluation gives feedback to developer of Puppet Theater to reconfigure a program for different context and users. They reworked the program and added tools in response to formative data from other users.

    Eventhough formative evaluation is frequently used by practitioners, most of organizations did not accept the formative evaluation as a part of their program. They do not understand the purpose or utility of formative evaluation because they think that formative evaluation is only for the finishing product evaluation, for incompetent/inexperience designers, and for the insufficient personal evaluators. Flagg also mentioned that there are six reasons why formative evaluation is not given in the development of educational materials in electronic technologies. The major excuses are ones of time (under pressure to produce by certain deadlines), money (small percentage of production budget), human nature (constraint on creativity), unmet expectations (unrealistic expectations), measurement difficulties (difficult to measure long term objectives), and lack of knowledge (unaware of the philosophy and methods). In addition, the formative evaluation is not worth value if those in control of the project disagree with the philosophy of it, if developers can not agree on the goals of the program and the intended audience, and if there is no chance for change.

  5. Stages

Tessmer (1993) suggested four classically recognized types of formative evaluation:

  1. Expert review: experts reviews the instruction with or without the evaluator
  2. One-to-one: one learner at a time reviews the instructional with evaluator and give comments upon it
  3. Small-group: the evaluator tries out the instruction with a group of learners and records their performances and comments
  4. Field test: the evaluator observes the instruction being tried out in a realistic situation with a group of learners.

In order to understand the concept of formative evaluation, Tessmer drew two figures of the stages on formative evaluation. The figure below represents the conclusion of both figures. Within few explanation about how to apply self-evaluation, Tessmer suggested to conduct expert reviews and one-to-one together after self-evaluation, revise the instruction, conduct a small group, revise the instruction, hold the field test and revise and improve the instruction for the last time. One can use variation of those types applied in the four steps such as expert panels, (team of experts and evaluator), two-to-one (two learners review the instruction with the evaluator), and rapid prototyping (immediate field-test evaluation).

Unfortunately, Tessmer did not mention anything about how to or whether we can combine those types in each step of evaluation plan. Another problem that Tessmer did not give explanation is that when we conduct expert reviews and one-to-one together, what should be done to revise the prototype if there is no agreement between the expert and one-to-one evaluation about any factors/aspects of the prototype. In order to reduce this kind of confusion, this paper suggests to conduct each step carefully, one step by one step. Nevertheless whenever there is an opportunity to go back to previous step (for instance: time, money, and manpower), one can do so. The book of Tessmer did not mention this step. He assumed that whenever one can follow the steps precisely by considering the suggestions in the book thoroughly than one will not have any difficulties or make unreasonable errors in doing formative evaluation. This is the reason why he mentions that doing formative evaluation depends on the thoroughness of the plan. Another possible way of combining the steps of Tessmer is that to decide earlier what kind of information one wants to gain and in what step or from whom the information should be gained. Designing this type of information would be advantageous for combine the steps together as long as there have the availability of the resources (time, money, and manpower), the capability of the resources to be used effectively and efficiently, and the accomplishment of the implementation of the steps.

Flagg (1990) explains the stages of formative evaluation by considering the development of the program and the evaluation steps itself. The following is the description of the formative evaluation for television, software, and videodisc.

Program Development






Needs Assessment

Pre-production Formative Evaluation

Production Formative Evaluation

Implementation Formative Evaluation


By considering other experts, Flagg (1993) describes that the first evaluation phase is need assessment or front-end analysis where to obtain the reason of the program, content, and feasibility of the delivery system. The second phase (the pre-production formative evaluation) conceptualize the planning phase to guide the pre-production of the program called preliminary scripts. The third phase (production formative evaluation) revise the early program versions with the target group. The implementation formative evaluation phase operates the target learners in the environment for which it was designed. These phases can be found and described explicitly in chapter 4 – 7, except chapter 8 that is only for implementation formative evaluation.

These two similar but different stages of formative evaluation describe each step in detail but in different purpose. Tessmer explained each step in purpose of conducting formative evaluation on instructional design in general but Flagg described it on the purpose on evaluating the instructional technology format. In general these two stages used similar approach, questions, attention, and measurement tools used in conducting formative evaluation.

  1. Plan
  2. Tessmer suggested that the formative evaluation process can be done by the following steps, i.e. determining the goals and the resources for and constraints upon the evaluation, conducting task analysis, describing the learning environment, determining the media characteristics, outlining the information sought, selecting data gathering methods and tools, and implementing the evaluation. Tessmer described each step by not only giving questions that should be considered in doing formative evaluation, but also constructing some answers for them. For instance, the answer for "what do you want to find out from the evaluation?" are learning effectiveness, learner interest/motivation, content quality, technical quality, and implementability.

    In conducting formative evaluation in instructional technologies, Flagg gave four criteria, i.e. usability (usable for decision making), practicality (gain answer within the time limits & money available), importance (relevant to the objectives and situation), and uncertainty (uncertain to those expectation to be answered). Methods to be used in the formative evaluation can be hypothetico-deductive paradigm (top-down approach or theory-based hypothesis) to confirm or explore causal relationship between or among variables. Another method is inductive paradigm (bottom-up approach) which begins with collection of qualitative and quantitative data directed by the evaluation question.

    Giving much attention on each step of formative evaluation (22 pages for expert reviews, 29 for one-to-one, 35 for small group, and 16 for field test), Tessmer described in detail how to conduct each step of formative evaluation. In expert reviews is conducted to evaluate the clarity of objectives and content, practicality (technical quality) of the prototype, and validity of the materials by using connoisseurial or expert judgements reviews. Considering many aspects, Tessmer stated that in expert reviews there are many types of experts and many types of questions that should or should not be asked of each one, depending upon each expert’s strengths and the goals of the evaluation. To be more thorough in doing expert reviews, Tessmer gave an example at the end of each chapter.

    The advantages of one-to-one are its interactive and highly productive, easy, quick and inexpensive, its sources of revision information, the clearness of instruction and direction, the completeness of the instruction, and the adequate quality of the materials. Small group evaluation evaluates the effectiveness, appeal and implementability of the approach. It also gives the study many advantages such as inexpensive, easy to conduct, more accurate measures of teachers’ performance, and more improvement in the instruction prototype. Field test evaluation is conducted to describe the teacher acceptance, implementability, and organizational acceptance of the prototype approach. It can be used to confirm the revisions made in previous formative evaluations, to generate final revision suggestions, and to investigate the effectiveness of the prototype instruction and also to gain the polished version of the products and programs. Tessmer also provided an example in each chapter of one-to-one, small group, and field test.

    By considering other experts, Flagg (1993) describes that the first evaluation phase is need assessment or front-end analysis to obtain the reason of the need of the program, the content, and the feasibility of the delivery system. The gathering data information entails reviews of existing studies, test and curricula, expert’s reviews, and measurement of target audience characteristics. The second phase is called the pre-production formative evaluation where the conceptualization of the planning phase is guided the pre-production of the program called preliminary scripts or writers’ notebook. In electronic learning project researchers include the target audience and teachers in the process of making design decisions about content, objectives, and production formats but the expert reviews (content and design) will be used to guide the creativity of the designers and reduce uncertainty of some critical decisions. The third phase is called production formative evaluation where the program is revised considering the feedback from tryouts of early program versions with the target group. Information of user-friendliness, comprehensibility, appeal, and persuasiveness can give the production team confidence of success in their revisions and decisions. The subject matter specialists, the designers, and other experts can work together to improve the versions. The implementation formative evaluation phase is concerned with how the program operates with target learners in the environment for which it was designed. Field-testing helps designers see how program managers will really use their final products with target learners and the feedback aids the development of the support materials and the future programs. This phase differs from the summative evaluation since the later measures the learners who have not been yet exposed by the program.

    Meanwhile Tessmer gave specific method used in doing formative evaluation along with guidelines for collecting data and analyzing them in each step, Flagg described many alternatives of measurements, such as self-report (respondent responds on questionnaires of interviews), observation (renders an obtrusive and objective record), tests, records (collection of data). Even though there was not elaboration of the "how" question to conduct these measures, Flagg, by quoting Mielke (1973) stated that superiority or inferiority of a research method cannot be established an inherent quality, but it can be established in terms of performance in answering questions.

    In order to design sophisticated software, the user-friendliness (accessibility, responsibility, flexibility, and memory), the reception (attention, appeal and excitement), and the outcome effectiveness (motor skills, cognitive abilities, or attitude that the learners have to learn). The methods such as observation and mechanical recording devices will yield valuable information relevant to the accessibility, responsiveness, and flexibility of computer-based educational programs as well as self-report (think aloud, escorted trial, and diary) and tests. Giving much elaboration only on visual attention, the book elaborates each of these self-report methods especially in program evaluation analysis computer (PEAK system). The book also did not explain why only self-report methods are suitable for measuring these variables.

  3. Miscellaneous
  4. In general these two books gave different names but similar stages in doing formative evaluation. Tessmer gave attention on the general instructional design and Flagg focused on the instructional technologies. Tessmer described each stage in detail and gave an example in each stage, but Flagg gave five examples and described each stage in each example explicitly. The following table describes more general difference between the books. The statistic of the two books:


    Flaggs’ Book

    Tessmers’ Book








    5 ex. in the chapter 4-8

    3 ex. in the chapter 2

    1 ex. In the chapter 3-6





    Fl. 30



    Lawrence Erlbaum

    Kogan Page

    Year of issues



    Index of subject



    Index of authors







    In each chapter

    At the end





    3 in each chapters except chapter 2, 4-8

    1 in each chapter

    History of Formative Evaluation



  5. Recommendation

The first weakness of Flagg’s book is that the structure of the content of the book itself. Since there is no suggestion in using or how to read the book, one should read the introduction (chapter 1) carefully in order to have big view of the content of the book. Since the book is meant for students, practitioners, researchers, designers, and developers in order to give them in-depth process of formative evaluation by providing many examples, they should spent more time in the first chapter and go to the summary of each chapter (chapter 5, 6 & 8 have no summary) then begin to read the book in detail. The book is difficult for the beginners to understand but is easy to read by the experienced person. The reason behind this could be because there was no formative evaluation conducted for the book before the book is issued.

For whom who are beginners, it is very useful and understandable to read Tessmer’s book because it is very structured. It is not because of only 6 chapters in the book but it could be because the expert review and the field test have been conducted before it is issued. Even without reading "A Note to my readers about this Book", one can understand what, how, when, why, and why not to do formative evaluation in instructional design. The book can be usefully applied not only on educational instructional design but also in instructional technology. The book looks like a "cook book" where one can read, use, plan, create, and implement the formative evaluation by considering all the steps, strategies, questions, and suggestions included in the book.

For conducting a developmental research, these books can be useful resources. The books described in detail how to do, how to collect data, and how to analyze them for improving instructional design or instructional technologies. Using the books as guidelines for conducting formative evaluation is advantageous but the most important thing is that the individual should have a good and strong attitude toward conducting evaluation. It is not only because it will motivate person to do it but also because it will help the person to understand, collect, compare, and analyze the data virtuously and thoroughly in order to gain the best result of the phenomena in the evaluation. Formative evaluation needs a fair, truth, and honest attitude of a person in conducting it. Otherwise the result is only GIGO (Garbage In – Garbage Out).




Flagg, Barbara N. 1990. Formative evaluation for educational technologies. Hillsdale, New Jersey: Lawrence Erlbaum Associates, Publisher.

Krathwohl, David R. 1998. Methods of educational & social science research: An integrated approach. New York: Addison-Wesley Longman, Inc.

Tessmer, Martin. 1993. Planning and conducting formative evaluation. London: Kogan Page Limited.