Software Testing for Document Engineering

Gregory M. Kapfhammer

September 15, 2025

How do you know if your document engineering tool is correct? You test it!

Software testing for document engineering

Review the basics of document engineering
Explore software testing techniques
Learn how to test document engineering tools
Let’s start by learning the importance of correctness!

Document engineering tools and workflows must run correctly

Text processing: does each function parse content correctly?
Document analysis: do algorithms extract the right information?
File operations: are documents read and written correctly?
Content validation: do parsers handle malformed input properly?
Output generation: does the tool create documents as expected?

Software testing techniques help to ensure that document tools work correctly! Let’s learn more about software testing!

Testing document engineering tools gives confidence in correctness

Steps for testing document engineering tools:
- Create sample document input
- Setup the tool’s environment
- Process the input through the tool
- Collect the output from the tool
- Compare output to expected results
- Report any discrepancies as defects

Testing versus benchmarking for document engineering tools

Testing: Create and run test cases to confirm document tools produce correct output for sample documents
Benchmarking: Measure timing and performance of document processing operations like parsing or formatting
- Testing and benchmarking are complementary methods
- Course focuses on testing document engineering tools
- Explore more about benchmarking in Algorithm Analysis!

How would you test the `Doubler`?

class Doubler:
    def __init__(self, n):
        self._n = 2 * n

    def n(self):
        return self._n

x = Doubler(5)
print(x.n() == 10)
assert(x.n() == 10)
y = Doubler(-4)
print(y.n() == -8)
assert(y.n() == -8)

True
True

Establish a confidence in the correctness of the Doubler class
When testing is it better to use print or assert statements?

Explore use of the `print` and `assert`

Key Tasks: After creating assert statements that will pass and fail as expected, decide which you prefer and why! What situations warrant a print statement and which ones require an assert?

Test document processing tools

Answer key questions when testing document tools:
- Does the tool correctly parse document formats?
- After changing code, does processing still work?
Using assertion statements during testing:
- print statements require manual checking of output
- assert statements automatically verify correctness
Use a testing framework like pytest or unittest
Assess test coverage with coverage.py

`unittest` for `DayOfTheWeek`

import unittest
from dayoftheweek import DayOfTheWeek

class TestDayOfTheWeek(unittest.TestCase):
    def test_init(self):
        d = DayOfTheWeek('F')
        self.assertEqual(d.name(), 'Friday')
        d = DayOfTheWeek('Th')
        self.assertEqual(d.name(), 'Thursday')

unittest.main(argv=['ignored'], verbosity=2, exit=False)

<unittest.main.TestProgram at 0x7f9d78d7e510>

Call unittest.main differently for tests outside Quarto
Run test_dayoftheweek.py in slides/weekfour/
The OK output in terminal confirms passing assertions

Explore `DayOfTheWeek`

class DayOfTheWeek:
    """A class to represent a day of the week."""
    def __init__(self, abbreviation):
        """Create a new DayOfTheWeek object."""
        self.abbreviation = abbreviation
        self.name_map = {
            "M": "Monday",
            "T": "Tuesday",
            "W": "Wednesday",
            "Th": "Thursday",
            "F": "Friday",
            "Sa": "Saturday",
            "Su": "Sunday",
        }

    def name(self):
        return self.name_map.get(self.abbreviation)

Support the lookup of a day of the week through an abbreviation like Sa
Simple example helps us learn testing before complex document processing

Exploring test-driven development in Python

Questions
Practices

Test-driven development (TDD) states “tests before code”:
- How will you use a function?
- What are the function’s inputs and outputs?
- Can you write code to make the tests pass?

The “TDD mantra” is Red-Green-Refactor:
- Red: The tests fail. You haven’t written the code yet!
- Green: You get the tests to pass by changing the code.
- Refactor: You clean up the code, removing duplication.

L1 = [1, 2, 3, 4, 5]
L2 = [6, 7, 8, 9, 10]
avg1 = sum(L1)/len(L1)
avg2 = sum(L2)/len(L2)
print("avg(", L1, ") -->", avg1)
print("avg(", L2, ") -->", avg2)

avg( [1, 2, 3, 4, 5] ) --> 3.0
avg( [6, 7, 8, 9, 10] ) --> 8.0

This code will not work for empty lists!
And, the code is repetitive and hard to read
Can we refactor the program to avoid the defect?

L1 = [1, 2, 3, 4, 5]
L2 = [6, 7, 8, 9, 10]
if len(L1) == 0:
    avg1 = 0
else:
    avg1 = sum(L1) / len(L1)
if len(L2) == 0:
    avg2 = 0
else:
    avg2 = sum(L2) / len(L2)
print("avg(", L1, ") -->", avg1)
print("avg(", L2, ") -->", avg2)

avg( [1, 2, 3, 4, 5] ) --> 3.0
avg( [6, 7, 8, 9, 10] ) --> 8.0

This avoids the defect but is repetitive and hard to read!

def avg(L):
    if len(L) == 0:
        return 0
    else:
        return sum(L) / len(L)

L1 = [1, 2, 3, 4, 5]
L2 = [6, 7, 8, 9, 10]
avg1 = avg(L1)
avg2 = avg(L2)
print("avg(", L1, ") -->", avg1)
print("avg(", L2, ") -->", avg2)

avg( [1, 2, 3, 4, 5] ) --> 3.0
avg( [6, 7, 8, 9, 10] ) --> 8.0

The avg function avoids the defect and is easier to read!

Bug hunt for average computation

Key Tasks: After confirming that the program works for the initial lists in L1 and L2, try to find the defect. Can you make a solution that works for empty lists? How do you know it is correct?

Refactoring in document engineering

What is refactoring?
- Defined: Better code structure without changing features
- Goal: Enhance aspects of document processing tools
Why refactor document engineering systems?
- Readability: Helps others to understand text processing logic
- Maintainability: Simplifies modifications and debugging
- Reusability: Promotes code reuse across document tools
- Performance: Aim to improve text processing efficiency
Use test cases to confirm correctness of refactoring!

What to test in document tools?

For each document processing function, ask these questions:
- What should happen when processing different document types?
- How do I want to use this document analysis function?
- What are the inputs and outputs of text processing?
- What should be the function’s document inputs and outputs?
- What are the edge cases for document processing?
Test the system’s expected behavior, not its implementation
Test the public interface of document processing tools
Transform detected defects into repeatable test cases
Later, as schedule permits, assess adequacy of tests with coverage.py

Testing aids document tool design

Software testing helps refine document engineering tool design
Interplay between testing and document tool design:
- See data (documents) and operations (processing functions)
- Specify what should happen when processing documents
- Write a unit test case to encode expected behavior
- Confirm that all test cases pass correctly
- Refactor code to improve document processing design
- Repeatedly run test suite to confirm correctness
Systems with good designs are easier to test and maintain!

Don’t benchmark until you are done testing!

Testing: Use test cases to confirm document tools produce correct output for sample documents and operations
Benchmarking: Measure timing and performance of document processing operations like parsing
Running experiments on incorrect document tools may compromise results. Always run a small trial first!

Test a document analysis function

import re
from typing import Dict, Any

def document_summary(text: str) -> Dict[str, Any]:
    """Generate a comprehensive summary of document statistics."""
    words = [word for word in text.split() if any(char.isalnum() for char in word)]
    word_count = len(words)
    sentences = re.split(r'[.!?]+', text)
    sentence_count = len([s for s in sentences if s.strip()])
    paragraphs = [p for p in text.split('\n\n') if p.strip()]
    paragraph_count = len(paragraphs)
    avg_words_per_sentence = word_count / sentence_count if sentence_count > 0 else 0
    return {
        'word_count': word_count, 'sentence_count': sentence_count,
        'paragraph_count': paragraph_count,
        'avg_words_per_sentence': round(avg_words_per_sentence, 1)
    }

sample_text = "Hello world. This is a test."
result = document_summary(sample_text)
print(f"Words: {result['word_count']}, Sentences: {result['sentence_count']}")
assert result['word_count'] == 6
assert result['sentence_count'] == 2

Words: 6, Sentences: 2

Test the `document_summary` function

Implement a Python function like document_summary
Create an input document as a string like sample_text
Call the function and collect the output in result
Use assert statements to confirm correctness of output
Manually inspect printed output for additional confidence
Yet, can we adopt a more automated approach to testing? Let’s explore the basics of automated testing with pytest! Frameworks make testing easy and repeatable.
You can use uv to add pytest as a project dependency!

Advanced testing for document analysis tools

Automated testing frameworks like pytest and unittest:
- Enable structured definition and running of tests
- Perform parameterized test cases with pytest
- Support property-based testing with hypothesis

Testing `DayOfTheWeek` with Pytest

from daydetector.dayoftheweek import DayOfTheWeek

def test_init():
    """Test the DayOfTheWeek class."""
    d = DayOfTheWeek("F")
    assert d.name() == "Friday"
    d = DayOfTheWeek("Th")
    assert d.name() == "Thursday"
    d = DayOfTheWeek("W")
    assert d.name() == "Wednesday"
    d = DayOfTheWeek("T")
    assert d.name() == "Tuesday"
    d = DayOfTheWeek("M")
    assert d.name() == "Monday"

Standard format for test names and files
Automated discovery and running of the tests
Extension through the use of plugins

Parameterized Tests with `pytest`

@pytest.mark.parametrize(
    "abbreviation, expected",
    [
        ("M", "Monday"),
        ("T", "Tuesday"),
        ("W", "Wednesday"),
        ("Th", "Thursday"),
        ("F", "Friday"),
        ("Sa", "Saturday"),
        ("Su", "Sunday"),
        ("X", None),
    ],
)
def test_day_name(abbreviation, expected):
    """Use parameterized testing to confirm that lookup works correctly."""
    day = DayOfTheWeek(abbreviation)
    assert day.name() == expected

Express the inputs and the expected outputs in a table!
Same approach works for testing document processing functions

Property-based test case

import hypothesis.strategies as st
from hypothesis import given
import pytest

@pytest.mark.parametrize(
    "valid_days",
    [["Monday", "Tuesday", "Wednesday", "Thursday",
      "Friday", "Saturday", "Sunday"]],
)
@given(
    st.text(alphabet=st.characters(), min_size=1, max_size=2)
)
def test_abbreviation_maps_to_name(valid_days, abbreviation):
    """Use property-based testing with Hypothesis to confirm mapping."""
    day = DayOfTheWeek(abbreviation)
    assert day.name() in valid_days or day.name() is None

Hypothesis strategies generate random character inputs for the abbreviation parameter, thereby increasing the input diversity

Oh, one more thing! You could use a large language model to write tests in `test_dayoftheweek.py`! Wow!

What are the benefits and downsides of using artificial intelligence (AI) to generate tests?
What are situations in which you should and should not use AI to generate tests?
Tests establish a confidence in correctness!

Reminder of course goals

Document Creation:
- Design and implement document generation workflows
- Test all aspects of documents to ensure quality and accuracy
- Create frameworks for automated document production
Document Analysis:
- Collect and analyze data about document usage and quality
- Visualize insights to improve documentation strategies
Communicate results and best practices for document engineering
References for this week’s content:
- Online textbook called A First Course on Data Structures in Python
- The ds2 package in the donsheehy/datastructures GitHub repository

Software Testing for Document Engineering

How do you know if your document engineering tool is correct? You test it!

Software testing for document engineering

Document engineering tools and workflows must run correctly

Testing document engineering tools gives confidence in correctness

Testing versus benchmarking for document engineering tools

How would you test the Doubler?

Explore use of the print and assert

Test document processing tools

unittest for DayOfTheWeek

Explore DayOfTheWeek

Exploring test-driven development in Python

How can you refactor Python code?

Bug hunt for average computation

Refactoring in document engineering

What to test in document tools?

Testing aids document tool design

Don’t benchmark until you are done testing!

Test a document analysis function

Test the document_summary function

Advanced testing for document analysis tools

Testing DayOfTheWeek with Pytest

Parameterized Tests with pytest

Property-based test case

Oh, one more thing! You could use a large language model to write tests in test_dayoftheweek.py! Wow!

Reminder of course goals

How would you test the `Doubler`?

Explore use of the `print` and `assert`

`unittest` for `DayOfTheWeek`

Explore `DayOfTheWeek`

Test the `document_summary` function

Testing `DayOfTheWeek` with Pytest

Parameterized Tests with `pytest`

Oh, one more thing! You could use a large language model to write tests in `test_dayoftheweek.py`! Wow!