If you want to save your work on-line in the Logical Framework form, please log in.



Tips

Module 7: Analyzing Data

Descriptive Statistics for Numerical Analysis

(adapted from A Guide to Monitoring and Evaluating Adolescent Reproductive Health Programs)

Descriptive statistics are used to explain the general characteristics of a set of data. You can use these methods with any of the numerical indicators you have for any stage of monitoring and evaluation. Descriptive statistics include

  • frequencies,
  • percents,
  • counts, and
  • averages (means and medians).

A frequency states the number of occurrences or observations for a variable. For example, if the variable you are examining is ‘respondent has one child’, and you found that of the 135 teenage girls you interviewed who gave valid responses, 52 had one child, you are stating a frequency. When frequencies related to a single item are listed together, this is referred to as a frequency distribution. For example, if you found that of these 135 girls, 65 had no children, 52 had one child, 14 had two children and 4 had three children, you could present a frequency distribution of these statistics, shown below.


Frequency Distribution of number of children among 13-19 year old girls
Number of children Number of girls who gave this response Percent Other descriptive statistics
0 65 65/135 = 48.1% Median # of children among 13-19 year old girls is one child.
51.9% of 13-19 year old girls have at least one child.
Mode the most frequent response 13-19 year-old girls gave to this question was zero children.
1 52 52/135 =38.5%
2 14 14/135 =10.4%
3 4 4/135 =3.0%
Total Valid Responses 135 100
Missing 13 13/165 = 7.8% of all 13-19 girls in the sample had missing data for this item

The first column in the frequency distribution shows the response categories (0 children, 1 child, 2 children, 3 children). The second column shows the frequency; in other words, the number of respondents that fall into each of these categories. For example, the table shows that 65 respondents have 0 children, while 52 report having one child. The third column in the frequency distribution shows the percent of the total respondents that fall into each response category.

For example, of the total of 135 respondents, 48.1% have no children, while 38.5% have one child.


Using a percent instead of a count helps you standardize your results so that you can compare them better. Percentages are calculated by dividing the frequency of one variable by the total number of observations, then multiplying it by 100. For example, the 65 girls that have 0 children make up 48.1% of the sample: 65/135 = .481(100) = 48.1%

Descriptive statistics can be useful to notice trends or patterns in your data. For instance you may be surprised that about half (51.9%) had at least one child. By taking the analysis further to include another variable, such as marital status, you can get more information.


You can show frequencies for the relationships between more than one variable through across-tabulation. For instance, in order to get a better picture of who is having children, you might want to present the relationship between data on number of children, with data on marital status of these teenage girls. Certain statistical or tabulation computer programs [such as SPSS] can do this for you. With small amounts of data, you can also count the occurrences and create this table by hand.

A table of this cross-tabulation can be presented as shown below.


Percent of married and unmarried 13-19 year old girls by number of children
Number of children Married percent Unmarried percent Total percent
0
1
2
3
Total
7.9
63.4
22.2
6.3
100
83.3
16.7


100
48.1
38.5
10.4
3.0
100

This cross-tabulation shows that, as expected, unmarried girls are more likely to have no children. This is even more clearly shown by collapsing the data into fewer categories, such as Zero Children, One or More Children.


Number of children Married percent
Unmarried percent
Total percent
0
1 or more
Total
7.9
91.9
100
83.3
16.7
100
48.1
51.9
100

The cross-tabulation also shows that some unmarried girls also have children and some married ones do not. While this is an obvious case, it shows how useful this kind of analysis can be with less obvious and more subtle data.




Forms of Qualitative Analysis

  • Pre-determined categories - if you have decided beforehand what you need to know, you can look through the data and code the data using those categories in terms of presence or absence of certain events, findings, perceptions, etc. These are often reported in terms of frequency with which people or documents expressed a certain attitude, perception, etc. or used certain words or phrases. This is the most basic and realistic form of analysis for most people who do not have extensive experience analyzing qualitative data.
  • Interpretative theme - one way of doing the analysis is by reading and re-reading documents or transcripts of interviews to identify themes that appear relevant to the program. Such themes come from the data and are not ones that were pre-determined before reading the data. Often such themes are helpful for exploring new areas and uncovering obstacles or problems that program planners may not have been aware of beforehand.
  • Quotes or testimonies - are the way you can report the actual data. Direct quotes from interviews are used to substantiate a point you are making about the way people who were interviewed see things. They should be written in quotation marks to distinguish them from paraphrasing, i.e. condensing another’s communication in one’s own words, without using quotation marks.
  • Typologies or ideal types –you may want to report after reviewing all of your data that there are one or more “types” of people, or experiences that are relevant to the program. For example, you may find that when young women have an older sister that they are close to, they experience their sexuality more positively, feel happier and have more decision making power in their relationships with their partners. Ideal types are generated from the data; you did not know they existed beforehand.
  • Taxonomy or diagram of meanings – one way of presenting the data would be to draw out a diagram of how the respondents link different words and meanings related to the program, giving important insight into how they actually understand key concepts. People’s understanding of concepts such as gender equity, reproductive health and healthy sexuality may change over the course of the program and therefore such diagrams could reveal important effects.



How Can You Know If a Difference or Change Is Significant or Meaningful?

Even if you have an experienced data analyst on your team who will use inferential statistics to assess whether observed changes or differences are likely to be due to chance or something else, you’ll need to determine whether a change or difference is large enough to be meaningful.
In small, low resource projects you will most likely not be able to use the kind of analysis that allows you to calculate statistical significance. So how can you know if your findings are meaningful?
  • Decide what you would think is meaningful before you collect the data or carry out the analyses; brainstorm with your team before data collection to decide on what magnitude of change or difference would be satisfying or notable; what changes are too small to really matter
  • Read up on other literature from similar program to know what kinds of differences they were able to make and what you could expect to demonstrate; know what is typical and what is possible so you can compare your results to those standards



How to Keep Your Codes Straight?

When you are planning your coding system, make sure to write down the definition of each and every code in a well-organized list usually called a “codebook”. During training, the data collectors should use this codebook to learn how to code their data collection instruments as quickly and accurately as possible. During data collection, if a response comes up that had not been expected, discuss it and decide how to code it, and make sure you enter it into the codebook, and that all the data collectors have an up-to-date copy. Collect old copies so that no one uses them by mistake!



    Kind of Analysis According to Level of Numerical Indicators

    Your analyses depend on the level of numerical indicator you are using. The levels are:

    • Categorical or nominal – these groups have no numerical value, but you can count the number of people, documents or events that fall into these categories
    • Ordinal – Responses can clearly be ranked in order of how much or little of a trait is present but the exact distance between adjacent responses is not precise. Responses are typically represented by words which can be ranked from low to high (e.g. often, seldom, rarely, never). This is the most common kind of number most small evaluations will be working with
    • Interval – numerical values that are equally distant from each other, such as weighs, heights or counts.
    • Categorical or nominal - groupings with no real numerical value. Don’t get confused if these were identified by numbers, such as 1= married, 2= unmarried, these are not real values.
      • report percent who gave each response
      • repeat above, but separately for subgroups of respondents
      • denominator should be people with non-missing data
      • # of missing should be reported at the same time (if number of missing is similar across all items, report it once)
    • Ordinal – groupings with relative numerical value but that do not necessarily have equal distances between different values, such as 1=high, 2=medium, 3=low. A common example of this type of data is a scale asking people to rate their comfort level from 1 (not at all comfortable) to 5 (very comfortable). You would never do an average with this kind of data.
      • report the median or middle value
        • Calculate a median by hand – middle value when data sorted from lowest to highest
      • calculate on the number of people with valid data
      • can also present the percent of responses in each category
      • collapse some categories that are similar, e.g. strongly agree and somewhat agree
      • to make comparisons for one person on ordinal scores at two different times (such as baseline and endline), you can only identify the direction of the change, i.e., no negative change, no change, positive change; if you have enough data you can break down those who started high or low and see if the amount or direction of change varies.
    • Interval –series of numbers that have equal distance or value between each number. You will need this level to be able to calculate statistical significance. Indicators at this level of measurement are actual numbers with precise amounts, e.g. number of hours a woman was in labor, number of miles walked to get to a health center, etc.
      • Use frequency, percent, cross-tabulation
    Calculate the arithmetic average and the median to check if the distribution is not skewed (that is, a lot of cases piling up with either very high or very low responses, which can create a misleading picture when data is summarized only with the mean); if the mean and median are similar, then you can report the arithmetic mean or average. When to Use the Mean, Median, or the Mode?



    Rights-Based Social Justice Considerations for both Quantitative and Qualitative Analyses

    Rights-based approach:

    • some sub-groups may have more or less benefit from the program, e.g. clients who chose their contraceptive methods vs. clients who accepted methods suggested by the provider; people who felt the program was not respectful of their rights vs. those who felt it was; people who expressed high levels of rights violations vs. those with lower levels
    • given issues of confidentiality, there may be limitations on how well you can compare on a case-by-case basis since cases may not be identifiable if you did not use unique identifiers; you need to make sure that you don’t have very different groups in your pre-and post tests to be able to use such un-matched comparisons

    Comprehensive and holistic sexual and reproductive health care:

    • follow-up on referrals made by the program to other services may be difficult for reasons of confidentiality, loss of cases in other agencies and inaccurate addresses
    • different components of a comprehensive service package may need to be analyzed separately if they are of varying quality
    • perceptions of quality of care may differ significantly across different sub-groups of clients and providers so you may need to separate out these groups

    Gender equality, gender equity and women’s empowerment:

    • separate out sub-groups of women who differ in terms of power and status, e.g., married vs. common law vs. single vs. widowed vs. divorced
    • analyze male and female respondents separately
    • compare responses by women/girls and those of men/boys on similar questions

    Promotion of healthy sexuality through the life cycle:

    • compare responses of girls/women vs. boys/men; if possible in linked couples, but if not, through group data; also compare responses of parents and their children
    • important to consider composite measures of good outcomes, e.g., delaying sexual debut and using contraception in consensual sexual relations are both positive outcomes

    Participation of women, communities and civil society organizations (CSOs):

    • compare the results of programs that have well-functioning community advisory councils vs. those whose councils do not function well
    • if participation is your goal or intermediate result, make sure to compare programs that have mechanisms that foster active participation by priority populations vs. those that do not
    • are women or young people participating actively? Do they feel their voices are being heard?

    Attention to marginalized and vulnerable populations:

    • compare data between subpopulations that differ in terms of marginality or vulnerability
    • assess the degree to which marginalized and/or vulnerable populations may have had different perceptions and levels of acceptance of the program
    • compare the differential effect of the program on groups that vary along dimensions of marginality and vulnerability and between groups that differ in how well they felt integrated or served by the program


    Steps for Qualitative Data Analysis

    There are multiple ways to analyze qualitative data which vary in terms of the:
    • Skill required for the person analyzing the data
    • Time needed for analysis
    • The depth and richness with which one wants to understand the meanings people give to the topics that have been discussed
    To develop your analytical categories:
    If you have people who are highly skilled at analyzing qualitative data, you might:
    • First read each entire transcript without writing anything down, just like you would read a novel. The purpose is to get the overall understanding of the meanings expressed and to start looking for themes that emerge
    • Next you will read it again and start writing descriptive codes, meaning words that describe the gist of what someone has said, next to quotes or paragraphs in the transcript
    • Next you will read through the transcripts again, and paying particular attention to the codes you have generated, you will start to create the larger themes which connect your codes
    • Next you will read it again and code your data with these larger themes
    • Next discuss the themes with your team and try to find mention of such themes in published literature or in the findings of programs similar to yours. Refine the codes
    • Re-read the transcript coding each mention of the themes you have identified
    • Now you can divide the transcripts into groups of people, documents or events that differ on important characteristics related to the program to make the kinds of comparisons in your analysis plan
    If your team has less experience with qualitative analysis, you can:
    • Analyze the answers to groups of questions separately
    • Use codes that have been decided upon ahead of time and are directly driven by and linked to your specific evaluation questions and program objectives or that have come from a first reading of the transcripts and feedback from the interviewers about major findings they have observed
    Once you have agreed upon the categories of analysis:
    • Read over the transcripts and mark the instances of the analytical categories, or topics, you agreed upon
    • You will next want to separate out the transcripts according to the comparisons you have planned to make, e.g., those who attended your program vs. those who did not; interviews before the program vs. interviews after the program
    • You will also want to look at how the results may differ within each of these groups. Did certain kinds of people differ in how they responded to the program? For example, did young men who had admitted to beating a sexual partner show greater changes after the program than young men who reported never having been violent? Deciding which of these kinds of internal comparisons to make should
      • Have been planned ahead of time
      • Derive from discussions with program staff and other stakeholders


    Suggestions for Data Tabulation by Hand

    Create a spreadsheet that has:

    • One row for each case, i.e. person, document or event
    • Include columns to identify the case, the data collector, the date, site/location of data collection and other essential identification information
    • One column for each of the items on the data collection instrument
    • If comparing the same case at base, mid and/or endline, then code the subsequent data to the right of the initial data on the same row. Ideally each case will have all its data in one row.

    Make sure to:

    • expect an entry in every cell. For example, deliberately code missing or not applicable cells as such, rather than leaving blank.
    • verify each data point to make sure it is correct. Look for unlikely answers such as a 3 as age at first voluntary sexual intercourse, which should have been a 23. Refer to the original data collection instrument to make corrections.
    • Record all data as they arrive – don’t wait until the very end of data collection


    What If You Have No Pilot Test Data?

    If there was absolutely no time to pilot test the instruments and data collection plan, and/or data collection is already underway. You can

    • take a small sample of early data and run an initial analysis
    • have a small group of colleagues and the data collectors fill in the instruments as if they were real respondents. Use these data to test out your plans for analysis.


    When Can You Use Inferential Statistics?

    The next level of analysis after descriptive statistics is inferential statistics. This is the kind that allows you to make conclusions about the larger more general population. However, to use this kind of analysis you need to have:

    • the right sample - planned from the outset
    • expert assistance
    • access to computer programs that help perform the calculations
    • fairly large amounts of data (usually at least 200 cases)
    • interval numeric indicators that are not biased


    When Not to Use a Percentage?

    Percentages are calculated by dividing the numerator by the denominator and then multiplying by 100. In the strictest sense of the word a percentage means how many of something happened out of 100 instances. Percentages are often calculated with denominators that are less than 100 but they can be misleading when the denominator is very small. If your sample is small, say less than 20, report the findings as number of a certain response out of the total number of cases assessed. For example, 3 out of 5 cases showed X result rather than saying 60%. This is because each person has a misleadingly large influence on the percent when the denominator is small.




    When to Use the Mean, the Median, or the Mode?

    These are ways of stating the typical response for interval level data. If you calculate the mean or average (by adding up all the responses and dividing by the number of items that were added up), you can misinterpret the data if they lean one way or another, i.e. are “skewed”. For example, if you ask people how many sexual partners they have or have had; they will usually answer a relatively small number, like 1 or somewhere up to 5 or 10. Only a few people will answer larger numbers, such as 50 or more. The distribution of the answer is skewed (or leans toward) to lower numbers. If you report the average and had one person who reported a high number of partners, your average would not be typical. So it would be better to report the median in this case (which is the middle value in a data set, with half of the values above and half below) . When the data are not skewed (or distributed normally) the mean and the median will be essentially the same. You can also consider using the mode- the most common values in a data set. This may be useful for example if you have a survey that tests for knowledge gained after a training and want to know the most common score participants achieved.




    Why Analyze Qualitative Data As You Go Along?

    One of the virtues of qualitative data is that it will provide you with insights about new issues to explore and new meanings you may not have expected. This flexibility allows you to add things, revise and reconsider what you want to ask, or what events you want to observe. You should also keep watch to see if you have reached a point of saturation, which means that you have ample data to support the variation in the kinds of answers you are getting to your questions, or themes, and that new data does not add needed support for findings or new findings. You may reach saturation on one range of answers to one question, or to one theme, but not to others. That is fine: you need to figure out which findings are most important to spend the resources to get adequate support for.


     
    Select search term from the drop down menu

    STEPS Update

    Workshop. International Conference on Family Planning: Research and Best Practices. November 18, 2009. Kampala, Uganda.


    Exhibit. American Public Health Association. November 7-11, 2009. Philadelphia, PA, USA.


    Workshop. Margaret Sanger Center International at Planned Parenthood of New York City. October 22-23, 27-28, 2009. Santo Domingo, Dominican Republic.

     

    For more information: ppnyc@stepstoolkit.org