Ongoing collaborative item development provides a continual supply of new items.
163 new items have developed in 2009-2010 that assess higher levels of cognitive complexity.

The item bank covers elementary, middle and high school levels and includes:

2096 Individual Assessment Items aligned to the National Health Education Standards
- 1460 Selected Response items in HEAP Item Bank
- 279 Short Answer items in HEAP Item Bank
- 161 Extended Response items in HEAP Item Bank
- 66 Performance Tasks in HEAP Item Bank
- 26 Short answer items in HEAP Cognitive Progression Sets
- 52 Extended response items in HEAP Cognitive Progression Sets
- 52 Performance tasks in HEAP Cognitive Progression Sets
232 Scored Student Exemplars including Rubric and Scoring Criteria
23 Anchor Paper sets with Practice Papers and Practice Scoring Key

Basic Use of the HEAP Item Bank by Classroom Teachers:

Search and view items in the item bank
Select desired items for assessment development
Access and View scoring guides, rubrics and scored student work
Create assessments and download for paper/pencil or deliver online
Customize/edit tests to meet desired purposes

Advanced features of the HEAP Item Bank:

Easy and Efficient Online testing that scales from classroom to statewide assessments
Immediate reporting of assessment results at the student/class/school/district/state levels
Collaborative Item development with workflow and review process
Easy Field-testing on new items
Test data can easily be imported/exported

Module Development

The project has developed two types of assessment units: modules and performance tasks. A typical module consists of eight selected-response and three constructed-response (two short answer and one extended response) items. All modules and performance tasks were developed around one of the descriptors in the Assessment Framework.

Different types of items have been developed for the HEAP:

Selected Response
Commonly called multiple choice (MC), these items can be scored by machines. These items are also known by the acronym SR.
Short Constructed Response Also known as short answer (SA), the assessment question requires the student to write a short answer. Constructed-response items are performance-based assessments.
Extended Response
The assessment question requires a longer response. Also written as an acronym ER, for Extended Response, these items are performance-based assessments.
Performance Tasks
Performance tasks are curriculum-embedded projects that students complete in or outside of class over an extended period of time. They are performance-based assessments.

Item Development

How the Assessment Items Were Developed

A key goal of the CCSSO~SCASS HEAP was to develop high-quality assessment items. The project developed selected-response items (i.e., multiple-choice), constructed-response items (i.e., short answer and extended response), and performance tasks. Performance tasks are curriculum-embedded projects grounded in authentic student experiences. They support skills-based, standards-based classroom instruction.

To help ensure the quality of the items, the project followed a rigorous process during item development. ACT, Inc., provided support and guidance as a subcontractor to the project. This section will focus on the process to develop approximately 1,400 assessment items in Phase II.

The project developed two types of assessment units: modules and performance tasks. Each module consisted of eight selected-response and three constructed-response (two short answer and one extended response) items. All modules and performance tasks were developed around one of the descriptors, which identify the health content that is linked to specific skills for student assessment in the HEAPís Assessment Framework.

Development of Modules and Performance Tasks

Participating states were asked to recruit item writers to develop the modules and performance tasks. The item writers, primarily practicing teachers and curriculum specialists, attended workshops where ACT, Inc., assessment development staff provided training in best practices for item writing. The writers were given the following guidelines:

Organize the materials around a descriptor in the Assessment Framework.
Focus on educationally significant material.
Make the materials interesting and engaging to students.
Ensure the items are independent of each other.
Have a broad, not regional, focus.
Provide a fair portrayal of racial/ethnic groups.
Ensure the wording is clear and direct.
Provide clear directions that are focused on the intended concepts.
For selected-response items, ensure there is only one correct answer.
For constructed-response items and performance tasks, ensure the item addresses both content and a skill.
For constructed-response items and performance tasks, ensure the criteria used to score the item reflect what is asked in the item prompt.

Following the training workshop, writers were given time to draft materials. The draft materials were then shared in a conference with both a health education specialist and an assessment specialist. In the conference, item writers were provided feedback on the materials they had developed.

Item writers continued their work, completed the first drafts of their assignments, and submitted their materials for review. A critique of the draft materials, consisting of suggestions from both a health consultant and an assessment consultant, was sent to each item writer. The critiques offered advice on how to revise materials further. Once revised drafts were received, staff from ACT, Inc., edited the material in preparation for external reviews.

Item Reviews

The next step required review of all of the assessment items by a large number of professionals. All items were reviewed from both a public health and school health perspective by health educators, representatives from the member states, assessment experts, individuals from CDC/DASH, and members of the HEAP Technical Advisory Group. Reviewers were asked to evaluate items based on a number of criteria, including accuracy, grade appropriateness, and fairness. Staff from ACT, Inc., and the HEAP then reviewed all comments and revised the items as necessary.

Pilot Testing and Scoring

As part of the project, the assessment items were informally pilot tested. Participating states recruited approximately 1,600 classrooms in elementary, middle, and high schools for this pilot test. The following table shows the states and counties that participated in the pilot tests.

States and Counties Involved in the Pilot Tests:

Alaska
Arkansas
Broward County, FL
California
Colorado
Connecticut
Delaware

Hawaii
Kansas
Kentucky
Maine
Massachusetts
Michigan

Montana
New York
North Carolina
Oregon
Rhode Island
South Carolina

South Dakota
Vermont
Washington
Wisconsin
West Virginia
Wyoming

Following the pilot tests, all selected-response and constructed-response items were scored. Professionally trained readers scored the studentsí responses. Item statistics were prepared and analyzed to serve as the basis for a final review of the materials. Project staff and state representatives reviewed all items again. Following this review, the items and constructed-response scoring criteria were revised and delivered to CCSSO.

Item Statistics

As part of the development process, the items were submitted to a multistage review. Reviewers included members of the State Representative Team, Steering Committee, and CDC/DASH. Following the reviews, the HEAP assessment items were pilot tested in classrooms. The selected response and constructed response items were then scored and summary statistics generated for each item. The items, along with the item statistics, were reviewed once more prior to the items being released to the HEAP member states.

The statistics from the pilot tests are available in Table View. The first eight columns in Table View contain information that identifies the item (item number, module number, module title, grade, item type, answer key, core concept, and skill). The statistics are provided in an additional 14 columns to the right of this information, so you may need to scroll right by using the bar near the bottom of the screen to view the item statistics.

Item statistics are available for core concept and/or skill. Not all statistics are appropriate for all items. In the item bank, statistics are differentiated between core concept and/or skill by a prefix of CC for core concept and Skill for skill. For example, the sample size (N) for core concept is presented as CC N and the sample size for skill is presented as Skill N.

N is the number of pilot test participants that responded to this particular item. (Please note that for constructed response items, there may be cases where students only responded to one component of the items.
Difficulty is the percentage of pilot test participants that selected the keyed response.
Mean is the average of the distribution of assigned scores. If the item was scored using a scale that ranges from 1 to 4, the mean will fall in this range.
SD is the standard deviation, which indicates the variability of the scores. The larger the standard deviation, the more the scores are spread across the entire scale (1 to 4).
Dist is the distribution, which indicates the proportion of participants in the pilot test that responded to each of the four foils or the proportion of participants that received a particular score.
IRT-b-value (IRTB) is a statistic that provides estimates of difficulty, on a common scale, for each item in the same grade level and content area. The more negative the value, the easier the item; the more positive, the more difficult the item.

When statistics are not provided for a specific item, it means that the item was significantly revised on the basis of the final reviews.

Assessment Terms

This section includes examples of HEAP assessment items to illustrate some of the terms.

Item Pool or Item Resource Bank
Terms used to describe the collection of assessment items developed for the HEAP. The types of items are selected response, constructed response (short and extended answer), and performance tasks.

Selected Response
Commonly called multiple-choice, these items can be scored by machines. These items are also known by the acronym SR.

Selected Response Example

Grade Level: Middle School
Content Area: Tobacco

Which of the following is NOT a good example of a refusal skill if someone offers you tobacco?

A. Say "no" and walk away
B. Start an argument
C. Say "no" and change the subject
D. Keep repeating that you're not interested

Short Constructed Response

Also known as a short answer (SA) response, the assessment question requires the student to write a short answer. Constructed-response items are performance-based assessments.

Short Constructed Response Example

Grade Level: High School
Content Area: Nutrition

Plan a healthful picnic lunch menu for a hot, sunny day. Describe how you will pack the foods for your picnic and explain why you have chosen these packing methods.

Extended Constructed Response

The assessment question posed requires a longer response. Also written as an acronym ER, for Extended Response, these are performance-based assessments.

Extended Constructed Example

Grade Level: High School
Content Area: Nutrition

Family and consumer science students have been asked to create posters about food safety that can be posted in their classrooms. Create a poster that lists five rules that should be followed when food is prepared, stored, and/or served. For each rule, explain why it is important to follow the rule. Be sure to make your poster persuasive.

Performance Tasks

Performance tasks are curriculum-embedded projects that students complete in or outside of class and over an extended period of time. They are performance-based assessments.

Perfomance Task Example

Grade Level: High School
Content Area: Nutrition

Students are asked to demonstrate their understanding of the types of nutrition resources that are available to low-income families in their community. They do so by researching a number of resources, conducting interviews, compiling their information, and presenting that information in a written report.

Anchors and Exemplars

Anchor papers are scored student responses used to establish the points for the scoring rubric. (Scoring rubrics are a set of guidelines for scoring performance-based student work.) Anchor papers are the student responses chosen to represent the mid-point of each level of performance described in the rubric. They literally anchor the rubrics and are essential to the scoring system.

Exemplars are previously scored examples of student work used in training to illustrate the four score points. Many different exemplars illustrate the various types of responses students can produce in responding to a stimulus for each of the score points, such as the example that follows.

Exemplar Example Prompt

Grade Level: High School
Content Area: Mental Health

Student Challenge

Your challenge is to examine a mental health problem, such as depression and suicide, stress, or eating disorders. You will analyze the causes and symptoms of the problem, explore treatment options, and then provide a listing of community resources that might be helpful in treating the problem. Your final product will be a brochure or presentation, including a list that actually can be used as a resource by individuals needing help or information to enhance their mental health.

Assessment Criteria

You will be assessed on your ability to demonstrate concepts and skills to help cope with the stresses of everyday living. Your project must include appropriate strategies for managing stress.

The following provides an example of scored student work.

Exemplar Example: Student Response

Exemplar Example: Student Response, continued

Scoring Rubrics

The HEAP has developed a holistic scoring system that applies to all of the performance-based assessment items developed for the project. HEAP uses a four-point scale for scoring both content and skills. For example, a score of four for concepts reflects a complex, accurate, and comprehensive response that shows breadth and depth of information, describes relationships, and draws conclusions. A score of one reflects little or no accurate information about the relationships between health concepts.