Nine Hole Peg Test (NHPT) – Strokengine (2024)

Purpose

The Nine Hole Peg Test (NHPT) was developed to measure finger dexterity, also known as fine manual dexterity. It can be used with a wide range of populations, including clients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. Additionally, the NHPT is a relatively inexpensive test and can be administered quickly.

In-Depth Review

Purpose of the measure

The Nine Hole Peg Test (NHPT) was developed to measure finger dexterity, also known as fine manual dexterity. It can be used with a wide range of populations, including clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More. Additionally, the NHPT is a relatively inexpensive test and can be administered quickly.

The NHPT should be used in association with other upper extremity performance tests, in order to estimate upper limb function with more accuracy.

Available versions

The NHPT was first introduced by Kellor, Frost, Silberberg, Iversen, and Cummings in 1971. In 1985, norms for the NHPT in healthy individuals were established by Mathiowetz, Weber, Kashman, and Volland.

Features of the measure

Items:

The NHPT is composed of a square board with 9 pegs. At one end of the board are holes for the pegs to fit in to, and at the other end is a shallow round dish to store the pegs. The NHPT is administered by asking the client to take the pegs from a container, one by one, and placing them into the holes on the board, as quickly as possible. Clients must then remove the pegs from the holes, one by one, and replace them back into the container. In order to practice and register baseline scores, the test should begin with the unaffected upper limb. The board should be placed at the client’s midline, with the container holding the pegs oriented towards the hand being tested. Only the hand being evaluated should perform the test. The hand not being evaluated is permitted to hold the edge of the board in order to provide stability (Mathiowetz et al., 1985; Sommerfeld, Eek, Svensson, Holmqvist, & Arbin, 2004).

Scoring:

Clients are scored based on the time taken to complete the test activity, recorded in seconds. The stopwatch should be started from the moment the participant touches the first peg until the moment the last peg hits the container. (Grice, Vogel, Le, Mitchell, Muniz, & Vollmer, 2003; Mathiowetz et al., 1985).

Mathiowetz et al. (1985) reported that on average, healthy male adults complete the NHPT in 19.0 seconds (SD 3.2) with the right hand, and in 20.6 seconds (SD 3.9) with the left hand. For healthy female adults, the NHPT was completed in 17.9 seconds (SD 2.8) and 19.6 seconds (SD 3.4) with the right and left hand, respectively.

Alternative scoring – the number of pegs placed in 50 or 100 seconds can be recorded. In this case, results are expressed as the number of pegs placed per second (Jacob-Lloyd, Dunn, Brain, & Lamb, 2005; Sunderland, Trinson, Bradley, & Langton-Hewer, 1989).

Time:

Not typically reported. Norms indicated above indicate approximate testing times in normals.

Subscales:

None

Equipment:

The standardized equipment consists of:

A board, in wood or plastic, with 9 holes (10 mm diameter, 15 mm depth), placed apart by 32 mm (Mathiowetz et al., 1985; Sommerfeld et al., 2004) or 50 mm (Heller, Wade, Wood, Sunderland, Hewer, & Ward, 1987).
A container for the pegs. Initially the container was a square box (100 x 100 x 10 mm) apart from the board. The most current container is a shallow round dish at the end of the board (Grice et al., 2003).
9 pegs (7 mm diameter, 32 mm length) (Mathiowetz et al., 1985).
Stopwatch.

Training:

None typically reported.

Alternative forms of the Nine Hole Peg Test

None.

Client suitability

Can be used with:

Clients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More.
Clients should have a satisfactory level of upper limb fine motor skills as they must be able to pick up the pegs to complete the test.

Should not be used in:

The NHPT cannot be used with clients who have severe upper extremity impairment.
The NHPT cannot be used with clients with severe cognitive impairment.
Scoring with an upper time limit of 50 or 100 seconds requires caution especially in the acute post-stroke period due to the possibility of floor effects (Jacob-Lloyd et al., 2005; Sunderland et al.,1989).

In what languages is the measure available?

There are no official translations of the NHPT.

Some publications from Netherlands, Japan and Sweden have used the NHPT as an outcome measure, which shows its use in languages other than English. (Dekker, Van Staalduinem, Beckerman, Van der Lee, Koppe, & Zondervan, 2001; Hatanaka, Koyama, Kanematsu, Takahashi, Matsumoto, & Domen, 2007; Sommerfeld et al., 2004).

Summary

What does the tool measure?	Finger dexterity.
What types of clients can the tool be used for?	The NHPT can be used with, but is not limited to clients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. There are no restrictions when administering it to clients with chronic strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. With clients with acute strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More the mode of scoring should be observed in order to avoid floor effects.
Is this a screeningTesting for disease in people without symptoms. or assessment tool?	Assessment
Time to administer	The amount of time it takes to administer the NHPT has not been reported and it will vary according to the client’s impairment or the mode of scoring.
Versions	There are no alternative versions.
Other Languages	There are no official translations.
Measurement Properties
ReliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .	Internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. More: No studies have examined the internal consistencyA method of measuring reliability . Internal consistency reflects the extent to which items of a test measure various aspects of the same characteristic and nothing else. Internal consistency coefficients can take on values from 0 to 1. Higher values represent higher levels of internal consistency. More of the NHPT. Intra-rater: Three studies have examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater’s subsequent ratings are contaminated by knowledge of earlier ratings. of the NHPT. Both reported excellent intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater’s subsequent ratings are contaminated by knowledge of earlier ratings. and one reported adequate intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater’s subsequent ratings are contaminated by knowledge of earlier ratings. using correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients. One study used Spearman rho and the two others, Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. . Inter-rater: Three studies have examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. of the NHPT and reported inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept. using correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients. One study used Spearman rho and the two others, Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. .
ValidityThe degree to which an assessment measures what it is supposed to measure.	Criterion: Concurrent: Two studies have examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also “gold standard.” of the NHPT. The first study examined the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.” of the NHPT comparing it to the Frenchay Arm Test as the gold standardA measurement that is widely accepted as being the best available to measure a construct. and reported that NHPT has a low sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.” , with 27% of misclassified results. The second study examined the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also “gold standard.” of the NHPT and reported adequate to excellent correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. with the Box and Block Test (BBT) and the Action Research Arm Test (ARAT) at pre and post-treatment. Predictive: One study has examined predictive validityA form of criterion validity that examines a measure’s ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks. and reported that NHPT is not able to predict functional outcomes after six months of strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. Construct: Convergent: One study has examined convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other. of the NHPT and reported excellent correlations between the NHPT and the Motricity Index using Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation. coefficients.
Floor/Ceiling Effects	Two studies have examined floor effects of the NHPT. In both studies, clients were scored based on a cutoff of 50 or 100 seconds. Participants not able to complete the test within this time were scored as 0. In both studies, at earlier phases of the strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More, floor effects were poor or adequat. After six months of strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More the floor effects were adequate.
Does the tool detect change in patients?	Two studies have examined the ability to detect change of the NHPT and reported that the NHPT is able to detect change.
Acceptability	The NHPT should not be used clients with severe upper extremity impairment and those who are not able to pick up the pegs.
Feasibility	The administration of the NHPT is quick and simple, however it requires standardized equipment. One study has examined the feasibility of the NHPT and reported that, on average, 52% of clients with acute strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More were not able to perform the NHPT (Jacob-Lloyd et al., 2005).
How to obtain the tool?	The NHPT instructions can be obtained in the study by Mathiowetz et al., (1985). Also, a version of the measure can be obtained from the publication by Wade (1992). Davis et al. (1999) reported the most used standardized equipments for NHPT in the United States are produced by Smith and Nephew Rehabilitation, Inc. and Sammons Preston. Standardized equipment can be obtained at the website: http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=A8515

Psychometric Properties

Overview

We conducted a literature search to identify all relevant publications on the psychometric properties of the Nine-Hole Peg Test (NHPT) in two different populations – healthy normal subjects and individuals with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More. We identified seven. The results of these suggest that the NHPT may be a reliable, valid and responsive measure in clients with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More. In clients with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More, the NHPT needs to be used carefully due to the possibility of floor effects.

In a literature review, Croarkin, Danoff, and Barnes (2004) identified the level of evidence1a (Strong) : Well-designed meta-analysis, or 2 or more high quality RCTs (PEDro ≥ 6) showing similar findings 1b(Moderate): 1 RCT of high quality (PEDro ≥ 6) 2a (Limited): At least 1 fair quality RCT (PEDro = 4-5) 2b (Limited): At least one poor quality RCT (PEDro < 4) or well-designed non-experimental study (non-randomized controlled trial, quasi-experimental studies, cohort studies with multiple baselines, single subject series with multiple baselines, etc.) 3 (Consensus): Agreement by an expert panel or a group of professionals in the field or a number of pre-post studies all with similar results 4 (Conflicting): Conflicting evidence of 2 or more equally well-designed studies 5 (No evidence): No well-designed studies - only case studies/case descriptions or cohort studies/single subject series with no multiple baselines) More for nine upper extremity motor function tests. The level of evidence1a (Strong) : Well-designed meta-analysis, or 2 or more high quality RCTs (PEDro ≥ 6) showing similar findings 1b(Moderate): 1 RCT of high quality (PEDro ≥ 6) 2a (Limited): At least 1 fair quality RCT (PEDro = 4-5) 2b (Limited): At least one poor quality RCT (PEDro < 4) or well-designed non-experimental study (non-randomized controlled trial, quasi-experimental studies, cohort studies with multiple baselines, single subject series with multiple baselines, etc.) 3 (Consensus): Agreement by an expert panel or a group of professionals in the field or a number of pre-post studies all with similar results 4 (Conflicting): Conflicting evidence of 2 or more equally well-designed studies 5 (No evidence): No well-designed studies - only case studies/case descriptions or cohort studies/single subject series with no multiple baselines) More was established based on the total number of psychometric properties addressed in studies of each test. Compared to the Action Research Arm Test (Lyle, 1981), Chedoke-McMaster StrokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More Assessment (Gowland, VanHullenaar & Torresin et al., 1995), Fugl-Meyer Sensorimotor Assessment (Fugl-Meyer, Jääskö, Leyman, Olsson & Steglind, 1975), Modified Motor Assessment Chart (Lindmark & Hamrin, 1988), Motor Assessment Scale (Carr, Shepherd, Nordholm & Lynne, 1985), Motor Club Assessment (Ashburn, 1982), Motricity Index (Demeurisse, Demol & Rolaye, 1980) et Rivermead Motor Assessment (Lincoln & Leadbitter, 1979),the NHPT was found to have the greatest number of psychometric properties supported, with studies on intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.
, inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
, convergent validityA type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
and predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
.

Floor/Ceiling Effects

Jacob-Lloyd, Dunn, Brain, and Lamb (2005) examined the ceiling and floor effects of the NHPT in 50 persons with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More. Participants were assessed twice within a 6 month interval. The first assessment was at hospital discharge. In this study, participants were scored based on the cutoff of 100 seconds. Those who took more than 100 seconds to complete the test were scored as 0. At discharge, the NHPT demonstrated an adequate floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
, with less than 20 % of the participants scoring the minimal value. After 6 months, the number of participants scoring the minimal value decreased with the NHPT still demonstrating an adequate floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
.

Sunderland, Trinson, Bradley, and Langton-Hewer (1989) examined the presence of a floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
in 31 participants with strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More. Assessments were performed at four points in time: admission, 1, 3 and 6 months post-stroke. Participants were given 50 seconds to complete the test. Those who were not able to complete the test within this time limit were scored as 0. Initially, the NHPT demonstrated a poor floor effectThe floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score. See also "ceiling effect."
of 65% but decreased at the 6 month follow up.
Note: No values were provided by the authors for the 6 month follow-up.

Reliability

Note: A number of the publications on reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
reviewed below used statistical analyses such as Pearson’s correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient that are not considered the analyses of preference for testing reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
and may artificially inflate reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
coefficients. Future studies should examine the reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the NHPT using ICC or Kappa statistics.

Test-retest:
No studies were identified examining the test-retest reliabilityA way of estimating the reliability of a scale in which individuals are administered the same scale on two different occasions and then the two scores are assessed for consistency. This method of evaluating reliability is appropriate only if the phenomenon that the scale measures is known to be stable over the interval between assessments. If the phenomenon being measured fluctuates substantially over time, then the test-retest paradigm may significantly underestimate reliability. In using test-retest reliability, the investigator needs to take into account the possibility of practice effects, which can artificially inflate the estimate of reliability (National Multiple Sclerosis Society).
of the NHPT.

Intra-rater:
Heller, Wade, Wood, Sunderland, Hewer, and Ward (1987) examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater’s subsequent ratings are contaminated by knowledge of earlier ratings.
of the NHPT, Frenchay Arm Test (Heller et al., 1987), Finger Tapping Rate (Lezak, 1983), and Grip Strength (Mathiowetz, Kashman, Volland, Weber, Dowe, & Rogers, 1985) in 10 patients with chronic strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. Participants were re-assessed with a 2-week interval by the same rater. In this study, results describe the range of reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
of the four measures mentioned above, and values for each individual measure were not provided. Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient was excellent (ranging for all four measures from r = 0.68 to 0.99).
Note: Although is not possible to discern the exact value for the NHPT`s reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of “true” variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to “noise” in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
, all values were considered excellent and statistically significant, suggesting that the NHPT may be reliable with stable strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More clients.

Mathiowetz, Weber, Kashman, and Volland (1985) examined the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater’s subsequent ratings are contaminated by knowledge of earlier ratings.
of the NHPT in 26 healthy female young adults. Participants were re-assessed with a 1-week interval by the same rater. The Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient showed excellent agreement (r = 0.69) for the right hand and adequate agreement (r = 0.43) for the left.

Grice et al. (2003) reproduced the Mathiowetz et al. (1985) study in order to estimate the intra-rater reliabilityThis is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater’s subsequent ratings are contaminated by knowledge of earlier ratings.
of the NHPT, after its design was slightly modified. In the Mathiowetz and associates’ study, the NHPT equipment was composed of a wooden board for the holes and a wooden square container for the pegs. The NHPT equipment was then modified to a plastic board with a shallow round dish as container, at the end of the board. Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient for the new NHPT was reported as adequate (r = 0.46; r = 0.44) for the right and left hand, respectively.

Inter-rater:
Heller et al. (1987) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the NHPT, Frenchay Arm Test (Heller et al., 1987), Finger Tapping Rate (Lezak, 1983), and Grip Strength (Mathiowetz et al., 1985) in 10 patients with chronic strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More. Participants were assessed twice within a week by two different raters. Spearman rho correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients were excellent (ranging for all four measures from r = 0.75 to 0.99).
Note: in this study, individual values for each measure were not provided. Although is not possible to discern the exact value for the NHPT`s reliabilityReliability can be defined in a variety of ways. It is generally understood to be the extent to which a measure is stable or consistent and produces similar results when administered repeatedly. A more technical definition of reliability is that it is the proportion of "true" variation in scores derived from a particular measure. The total variation in any given score may be thought of as consisting of true variation (the variation of interest) and error variation (which includes random error as well as systematic error). True variation is that variation which actually reflects differences in the construct under study, e.g., the actual severity of neurological impairment. Random error refers to "noise" in the scores due to chance factors, e.g., a loud noise distracts a patient thus affecting his performance, which, in turn, affects the score. Systematic error refers to bias that influences scores in a specific direction in a fairly consistent way, e.g., one neurologist in a group tends to rate all patients as being more disabled than do other neurologists in the group. There are many variations on the measurement of reliability including alternate-forms, internal consistency , inter-rater agreement , intra-rater agreement , and test-retest .
, all values were considered excellent.

Mathiowetz et al. (1985) examined the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the NHPT in 26 healthy young female adults. Participants were evaluated simultaneously and independently by two raters. Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients showed excellent agreement (r = 0.97; r = 0.99) for the right and left hand, respectively.

Grice et al (2003) reproduced Mathiowetz et al. (1985) study to estimate the inter-rater reliabilityA method of measuring reliability . Inter-rater reliability determines the extent to which two or more raters obtain the same result when using the same instrument to measure a concept.
of the new NHPT. Pearson correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases - for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases - for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficients showed excellent agreement (r = 0 .98; r = 0.99) for the right and left hand, respectively.

Validity

Content:

Not available.

Criterion:

Concurrent:
Sunderland et al. (1989) estimated the sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
of the NHPT, the Motor Club Assessment (Ashburn, 1982) and the Motricity Index (Demeurisse et al., 1980) by comparing them to the Frenchay Arm Test (Heller et al., 1987), as the gold standardA measurement that is widely accepted as being the best available to measure a construct.
, in 31 participants with acute strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. The NHPT had the lowest sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
with 27% of the cases incorrectly classified. The most sensitive measure, with 0% of cases misclassified, was the Motricity Index.

Lin, Chuang, Wu, Hsieh and Chang (2010) compared the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also “gold standard.”
of the NHPT, Action Research Arm Test (ARAT) and Box and Block Test (BBT) for evaluating hand dexterity in 59 patients with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. The Fugl-Meyer Assessment of Sensorimotor Recovery After StrokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More (FMA), Motor Activity Log (MAL) and StrokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More Impact Scale (SIS) were also administered to assess the concurrent validityTo validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument. See also “gold standard.”
of the NHPT, ARAT and BBT. Using Spearman rank correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
coefficient, the NHPT, ARAT and BBT were found to have adequate to excellent correlations at pre-treatment (ranging from rho=-0.55 to -0.80) and post-treatment (ranging from rho=-0.57 to -0.71). In addition, the ARAT and BBT were found to have adequate correlations with the FMA, MAL and SIS (ranging from rho=0.31 to -0.59); however, the NHPT had only poor to adequate correlations with the FMA and MAL (ranging from rho=-0.16 to -0.33); and adequate to excellent correlations with the SIS (ranging from rho=-0.58 to -0.66). When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the ARAT and BBT are believed to be more appropriate than the NHPT for evaluating dexterity.

Predictive:
Sunderland et al. (1989) examined whether the NHPT, Motor Club Assessment (Ashburn, 1982) and Motricity Index (Demeurisse et al., 1980) were able to predict functional outcomes at six months after strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More measured by the Frenchay Arm Test (Heller et al., 1987). Predictive validityA form of criterion validity that examines a measure's ability to predict some subsequent event. Example: can the Berg Balance Scale predict falls over the following 6 weeks? The criterion standard in this example would be whether the patient fell over the next 6 weeks.
of the NHPT was examined in 31 participants with acute strokeAlso called a "brain attack" and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a "schemic stroke", or the formation of a blood clot in a vessel supplying blood to the brain. More. Assessments were performed at four points in time: admission, 1, 3 and 6 months post-stroke. The NHPT administered at 1 month did not predict functional outcomes at 6 months. The best predictor of functional outcomes at 6 months was the Motricity Index.

Construct:

Convergent/Discriminant:
Parker, Wade, and Hewer (1986) tested the construct validityReflects the ability of an instrument to measure an abstract concept, or construct. For some attributes, no gold standard exists. In the absence of a gold standard , construct validation occurs, where theories about the attribute of interest are formed, and then the extent to which the measure under investigation provides results that are consistent with these theories are assessed.
of the NHPT by comparing the NHPT to the Motricity Index (Demeurisse et al., 1980) in 187 persons with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. The correlationThe extent to which two or more variables are associated with one another. A correlation can be positive (as one variable increases, the other also increases – for example height and weight typically represent a positive correlation) or negative (as one variable increases, the other decreases – for example as the cost of gasoline goes higher, the number of miles driven decreases. There are a wide variety of methods for measuring correlation including: intraclass correlation coefficients (ICC), the Pearson product-moment correlation coefficient, and the Spearman rank-order correlation.
between NHPT and Motricity Index was excellent (r = 0.82).

Known groups:
No studies have examined known groups’ validityThe degree to which an assessment measures what it is supposed to measure.
of the NHPT.

Responsiveness

Jacob-Lloyd et al. (2005) examined the responsivenessThe ability of an instrument to detect clinically important change over time.
of the NHPT in 50 persons with strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More. Participants were assessed twice within a 6 month interval. The first assessment was at hospital discharge. Effect sizes were calculated using Wilcoxon signed rank test. Although the author reported a large effect sizeEffect size (ES) is a name given to a family of indices that measure the magnitude of a treatment effect. Unlike significance tests, these indices are independent of sample size. The ES is generally measured in two ways: as the standardized difference between two means, or as the correlation between the independent variable classification and the individual scores on the dependent variable. This correlation is called the “effect size correlation”.
in this study, no reference values were provided. The NHPT was more likely to detect change than the Motricity Index (Demeurisse et al., 1980).

Lin, Chuang, Wu, Hsieh and Chang (2010) evaluated the responsivenessThe ability of an instrument to detect clinically important change over time.
of the NHPT, the Action Research Arm Test (ARAT) and Box and Block Test (BBT) for evaluating hand dexterity in 59 patients with subacute strokeAlso called a “brain attack” and happens when brain cells die because of inadequate blood flow. 20% of cases are a hemorrhage in the brain caused by a rupture or leakage from a blood vessel. 80% of cases are also know as a “schemic stroke”, or the formation of a blood clot in a vessel supplying blood to the brain. More (< 6-months) and Brunnstrom stage IV to VI for proximal and distal upper extremity function. Patients were randomly assigned to receive constraint-induced therapyA form of intervention that involves restraining the unaffected upper or lower extremity in order to encourage movement of the affected limbs. For persons with USN, constraint-induced therapy involves restraining the unaffected arm or hand using a sling or padded mitt, in order to promote visual scanning and movement in the neglected hemispace.
, bilateral arm training or control treatment and received 2 hours of therapy, 5 days per week for 3 weeks. Assessments were performed at baseline and 3 weeks. Using Standardized Response MeanThe standardized response mean (SRM) is calculated by dividing the mean change by the standard deviation of the change scores.
(SRM) to calculate responsivenessThe ability of an instrument to detect clinically important change over time.
, the NHPT, ARAT and BBT were all found to have moderate SRM (0.64 0.79, 0.74 respectively), indicating sensitivitySensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). See also “Specificity.”
for detecting change in hand dexterity. When considering both the results of responsivenessThe ability of an instrument to detect clinically important change over time.
and validation components of the study, the ARAT and BBT are believed to be more appropriate than the NHPT for evaluating dexterity.

References

Ashburn, A. (1982). A physical assessment for stroke patients. Physiotherapy, 68, 109-113.
Carr, J.H., Shepherd, R.B., Nordholm, L., & Lynne, D. (1985). Investigation of a new motor assessment scale for stroke patients. Physical Therapy, 65, 175- 180.
Croarkin, E., Danoff, J., & Barnes, C. (2004). Evidence-based rating of upper-extremity motor function tests used for people following a stroke. Physical Therapy, 84, 62-74.
Cromwell, F.S. (1965). Occupational therapists manual for basic skills assessment: primary prevocational evaluation. California, USA: Fair Oaks Printing.
Davis, J., Kayser, J., Matlin, P., Mower, S., & Tadano, P. (1999). Nine hole peg tests – are they all the same? Occupational Therapy Practice, 4, 59-61.
Dekker, C.L., Van Staalduinem, A.M., Beckerman, H., Van der Lee, J.H., Koppe, P.A., & Zondervan, R.C.J. (2001). Concurrent validity of instruments to measure upper extremity performance: the action research arm test; the nine hole peg test and the motricity index. Nederlands Tijdscrift Voor Fysiotherapie, 111(15), 110- 115.
Demeurisse, G., Demol, O., & Robaye, E. (1980). Motor evaluation in vascular hemiplegia. European Neurology, 19(6), 382-389.
Desrosiers, J., Rochette, A.,Hebert, R.,& Bravo, G. (1997). The minnesota manual dexterity test: reliability, validity and reference values studies with healthy elderly people. Canadian Journal of Occupational Therapy, 64(5), 270-276.
Fugl-Meyer, A.R., Jääskö, L., Leyman, I., Olsson, S., & Steglind, S. (1975). The post-stroke hemiplegic patient 1. A method for evaluation of physical performance. Scandinavian Journal of Rehabilitation Medicine, 7, 13-31.
Grice, K.O., Vogel, K.A., Le, V., Mitchell, A., Muniz, S., & Vollmer, M.A. (2003). Adult norms for a commercially available nine hole peg test for finger dexterity. American Journal of Occupational Therapy, 57, 570-573.
Gowland, C., VanHullenaar, S., Torresin, W., et al., (1995). Chedoke-McMaster Stroke Assessment: development, validation, and administration manual. Hamilton, (ON), Canada: School of Rehabilitation Science, McMaster University.
Hatanaka, T., Koyama, T., Kanematsu, M., Takahashi, N., Matsumoto, K., & Domen, K. (2007). New evaluation method for upper extremity dexterity of patients with hemiparesis after stroke: the 10-second tests. International Journal of Rehabilitation Research, 30(3), 243-247.
Heller, A., Wade, D.T., Wood, V.A., Sunderland, A., Hewer, R., & Ward, E. (1987). Arm function after stroke: measurement and recovery over the first three months. Journal of Neurology, Neurosurgery & Psychiatry, 50(6), 714- 719.
Jacob-Lloyd, H.A., Dunn, O.M., Brain, N.D., & Lamb, S.E. (2005). Effective measurement of the functional progress of stroke clients. British Journal of Occupational Therapy, 68 (6), 253-259.
Jebsen, R.H., Taylor, N., Trieschmann, R.B., Trotter, M.J., & Howard, L.A. (1969). An objective and standardized test of hand function. Archives of Physical Medicine and Rehabilitation, 50, 311-319.
Kellor, M., Frost, J., Silberberg, N., Iversen, I., & Cummings R. (1971). Hand strength and dexterity. American Journal of Occupational Therapy, 25, 77-83.
Lezak, M.D. (1983). Neuropsychological assessment. Oxford, England: Oxford University Press.
Lincoln, N.B. & Leadbitter, D. (1979). Assessment of motor function in stroke patients. Physiotherapy, 65, 48-51.
Lin, K-C., Chuang, L-L., Wu, C-Y., Hseih, Y-W. & Chang, W-Y. (2010). Responsiveness and validity of three dexterous function measures in stroke rehabilitation. Journal of Rehabilitation Research and Development, 47(6), 563-572.
Lindmark, B. & Hamrin, E. (1988). Evaluation of function capacity after stroke as a basis for active intervention: Presentation of a modified chart for motor capacity assessment and its reliability. Scandinavian Journal of Rehabilitation Medicine, 20, 103-109.
Lyle, R.C. (1981). A performance test for assessment of upper limb function in physical rehabilitation treatment and research. International Journal of Rehabilitation and Research, 4, 483-92.
Mathiowetz, V., Weber, K., Kashman, N., & Volland, G. (1985). Adult norms for the nine hole peg test of finger dexterity. Occupational Therapy Journal of Research, 5, 24 -33.
Mathiowetz, V., Kashman, N., Volland, G., Weber, K., Dowe, M., & Rogers, S. (1985). Grip and pinch strength: normative data for adults. Archives of Physical and Medicine Rehabilitation, 66, 69-72.
Parker, V. M., Wade, D. T., & Hewer, R. (1986). Loss of arm function after stroke: measurement, frequency, and recovery. International Rehabilitation Medicine, 8(2), 69-73.
Sommerfeld, D.K., Eek, E.U.B., Svensson, A.K., Holmqvist, L.W., & Arbin, M.H. (2004). Spasticity after stroke: its occurrence and association with motor impairments and activity limitations. Stroke, 35, 134-140.
Sunderland, A., Trinson, D., Bradley, L., Hewer, R. (1989). Arm function after stroke: an evaluation of grip strength as a measure of recovery and a prognostic indicator. Journal of Neurology, Neurosurgery & Psychiatry, 52, 1267-1272.
Tiffin, J. (1968). Purdue Pegboard Examiner Manual. Chicago, USA: Science Research Associates.
Wade, D.T. (1992). Measurement in Neurological Rehabilitation. Oxford, England: Oxford University Press.

See the measure

How to obtain the NHPT?

The NHPT instructions can be obtained in the study by Mathiowetz et al. (1985) and Wade (1992).

Davis, Kayser, Matlin, Mower, and Tadano (1999) reported that the most commonly used standardized equipment for the NHPT in the United States are produced by both Smith and Nephew Rehabilitation, Inc., and Sammons Preston.

Standardized equipment can be obtained at the website: http://www.sammonspreston.com/Supply/Product.asp?Leaf_Id=A8515