Exploring the Python Programming Language

Gregory M. Kapfhammer

September 1, 2025

What are some key features of the Python programming language?

  • Simple and readable syntax
  • Defining and evaluating expressions
  • Declaring and using variables with types
  • Creating and using collections

Python for document engineering

  • Simple and readable syntax
    • Easy to learn and understand
    • Code looks similar to written English
    • Focus on solving problems, not complex syntax
  • Excellent text processing capabilities
    • Built-in string operations and methods
    • Rich collection types for organizing data
    • Powerful libraries for document manipulation
  • Perfect suite of tools for prosegrammers
    • Combines programming power with clear, readable code
    • Ideal for creating tools that work with documents

Thinking and writing about Python programs

  • Sequence: Run operations one after another in an order
  • Selection: Decide which operations will execute
  • Iteration: Repeat an operation a specific number of times

These three concepts form the foundation of all programming logic! Let’s explore these in greater detail!

Sequence: Run operations one-by-one

def create_greeting(name: str) -> str:
    """Create a personalized greeting message for documents."""
    greeting = "Hello, " + name + "!"
    message = greeting + " Welcome to Document Engineering."
    return message

# call the function with different names
result = create_greeting("Alice")
print(result)
Hello, Alice! Welcome to Document Engineering.
  • Calling the create_greeting function causes each line to run sequentially
  • First: concatenate name with "Hello, " and "!"
  • Second: add the welcome message to create full greeting
  • Third: return the complete message to caller
  • Document connection: generating personalized content for reports!

Selection: Use conditional logic

def format_document_title(title: str, capitalized: bool) -> str:
    """Format a document title based on style preference."""
    if capitalized:
        formatted_title = title.upper()
        result = "DOCUMENT: " + formatted_title
    else:
        formatted_title = title.title()
        result = "Document: " + formatted_title
    return result

# test both formatting styles
print(format_document_title("user guide", True) + " / "
      + format_document_title("user guide", False))
DOCUMENT: USER GUIDE / Document: User Guide
  • The if statement checks the value of the formal parameter
  • Different code executes based on whether formal is True or False
  • Document connection: adapting content style for different audiences!

Iteration: Repeat an operation

def count_words_in_documents(documents: list) -> dict:
    """Count total words across multiple documents."""
    word_counts = {}
    for doc_name in documents:
        word_count = len(doc_name.split())  # simple word count
        word_counts[doc_name] = word_count
    return word_counts

# test with sample document names
docs = ["Manual", "Installation Guide", "API Reference Document"]
counts = count_words_in_documents(docs)
print(counts)
{'Manual': 1, 'Installation Guide': 2, 'API Reference Document': 3}
  • word_counts is a dictionary that starts out empty
  • The for loop iterates through each document name in the list
  • Document connection: analyzing multiple documents in a collection!

Collections contain multiple values

  • Strings: str in Python
  • Lists: list in Python
  • Tuples: tuple in Python
  • Dictionaries: dict in Python
  • Sets: set in Python

Let’s explore how these work and what are their differences!

Creating and using a string

  • What is the purpose of the type function?
  • What is the purpose of the full_title[9] notation?
  • What is the purpose of the str function?
  • What is the purpose of the + operator?

Creating and using a list

What does this illustrate about the list type in Python?

Creating and using a tuple

  • A tuple is an immutable collection of ordered values
  • What are some invalid operations on a tuple?
    • doc_info[3] = "DOCX"
    • doc_info.append("English")
    • doc_info.remove("PDF")
  • Document connection: storing fixed document properties!

Creating and using a dictionary

  • A dictionary stores key-value pairs in an associative mapping

Details about dictionary use in Python

  • Let’s explore further the source code from the previous slide:
    • The dict function make an empty dictionary document
    • Dictionaries store key-value pairs like {"title": "Python Programming Guide"}
    • The keys can be strings, numbers, or other hashable types
    • The values can be of any data type like str, int, and float
    • It is possible to lookup a value by its key as in document["title"]
    • Document connection: storing structured document information (e.g., pages in a chapter) in a mapping!

Creating and using a set

  • A set is an un-ordered collection of unique values
  • How are sets similar to and different from lists?

Details about set use in Python

  • Let’s explore further the source code from the previous slide:
    • Sets do not store duplicate values
    • The data in a set must be hashable
    • The add function places more data in a set
    • Repeated calls of add with "python" do not change the set
    • Be careful, {} is an empty dictionary, not a set!
    • Document connection: keep unique tags and keywords!
  • Different collections have different properties and use cases
  • What operations can you perform on a these collections?

Operations to perform on most collections

  • Determine the length of a collection
  • Add an element to or remove element from a collection
  • Access an element in a collection
  • Determine if a collection contains an element
  • Iterate through all of the elements in a collection
  • Slice a collection to get a subset of its elements

Sizing and slicing collections

Iterating through lists and tuples

Iterating through sets and strings

Iterating through a dictionary

Mutable and immutable collections

  • Mutable Collections: can be changed after creation
    • Lists (list): great for document sections that may change
    • Dictionaries (dict): perfect for document properties
    • Sets (set): ideal for collections of unique keywords
  • Immutable Collections: cannot be changed after creation
    • Tuples (tuple): excellent for fixed document metadata
    • Strings (str): text content that won’t be modified
  • Understanding mutability helps prosegrammers choose the right collection type for document data! Carefully pick your collection!

Containment checking for collections

More containment checking

Working with document content

def find_document_in_library(library: list, document_name: str) -> bool:
    """Check if a document exists in our document library."""
    # assume the document is not in the library
    found = False
    # check if the document name is in the library
    if document_name in library:
        found = True
    # return boolean to indicate whether document was found
    return found

# test with sample document library
my_library = ["Python Guide", "Web Development", "Data Analysis"]
result = find_document_in_library(my_library, "Python Guide")
print("Found 'Python Guide':", result)
Found 'Python Guide': True
  • How does find_document_in_library work?
  • Can you explain the its output? Can you enhance it? How?

Working with document metadata

def check_document_property(metadata: tuple, property_name: str) -> bool:
    """Check if a specific property exists in document metadata."""
    # assume the property is not in the metadata
    found = False
    # check if the property name is in the metadata tuple
    if property_name in metadata:
        found = True
    # return boolean to indicate whether property was found
    return found

# test with sample document metadata
doc_metadata = ("User Manual", "v2.1", "English", "PDF")
result = check_document_property(doc_metadata, "PDF")
print("Contains 'PDF' format:", result)
Contains 'PDF' format: True
  • What is the type of the parameter called metadata?
  • What is the type of the parameter called property_name?
  • What is the return type of check_document_property?

Key features of Python: expressions

Defining and evaluating expressions
  • pages_per_chapter is a variable
  • total_pages is a variable defined by an expression
  • The * operator multiplies two numbers
  • print outputs text to the console or the slides

Key features of Python: variables

Declaring and using variables with types
  • document_title is a variable of type str
  • page_count is a variable of type int
  • is_published is a variable of type bool
  • print outputs text to the console or the slides

Key features of Python: collections

Creating and using collections

  • chapter_titles is a variable of type list
  • Each value inside of chapter_titles is a str
  • chapter_titles[0] accesses the first list element
  • print outputs text to the console or the slides

Building your Python foundation

  • Master the fundamentals: sequence, selection, iteration
  • Understand collections: strings, lists, tuples, dictionaries, sets
  • Use document examples: titles, chapters, metadata, keywords
  • Connect Python to document engineering:
    • Process text and analyze content
    • Organize document information systematically
    • Automate repetitive document tasks

How exciting! You’re ready to become a prosegrammer! Python + documents = powerful automation and analysis capabilities!

Overall document engineering setup

Tips for effective document engineering setup

  • Devote time outside class to installing and configuring tools

  • Confirm that most tools work during the this week’s lab session

  • Create and render test documents with the provided examples

  • Complete the first document engineering project on time

  • Contribute to collaborative documentation projects

  • Prepare for first document engineering skill check

  • Get ready for an exciting journey into document engineering!

  • If you are having trouble, publicly ask for help on Discord!

  • If you would like to use a token, please contact the course instructor!

Document engineering skill-check reminders

  • What is a document engineering skill-check?
  • When do these skill-checks happen?
  • What Python programming tasks are involved?
  • How do I successfully complete a skill-check?

Programming assessments for your document engineering skills!

What is a skill-check?

  • Programming tasks completed on certain Fridays during class
  • Individual assessment of your document engineering skills
  • GitHub Classroom repository provided as your starting point
  • Contains TODO markers and blank functions for you to complete
  • Automated checking ensures your solution meets requirements
  • Time-limited completion of tasks during the class session

It’s a focused coding challenge that assesses what you’ve learned and confirms that you have tools setup correctly!

Step 1: Navigate to exam/ directory

# navigate to your skill-check repository after cloning
cd <your-skillcheck-repository-name>

# navigate to the exam directory where all the work happens
cd exam

# verify you're in the right place for the skill-check
ls -alg
  • Important: All skill-check work happens in the exam/ directory
  • You should see files and directories like these: questions/, tests/, gatorgrade.yml, pyproject.toml, and uv.lock
  • The questions/ directory contains files with TODO markers to complete
  • The tests/ directory contains automated tests to verify your work
  • The gatorgrade.yml file configures the gatorgrade assessment tool

Step 2: Run the assessment tool

# run gatorgrade to see what needs to be completed
uvx gatorgrade

# this will show you:
# ✅ Checks that are currently passing
# ❌ Checks that need work to pass
# 📊 Overall completion percentage
  • gatorgrade is the automated assessment tool
  • After installing uv, you can type uvx gatorgrade
  • Red X marks show what still needs to be fixed
  • Green checkmarks show completed requirements
  • Task: Iteratively complete work in required files
  • Goal: Keep working to get 100% of checks to pass

Step 3: Complete programming tasks

  • Open files in the questions/ directory (e.g., question_one.py)
  • Find TODO markers that indicate where to add code
  • Read function docstrings to understand what a function should do
  • Write Python code to implement the required functionality
  • Add comments to explain your code clearly
  • Remove TODO markers when you complete each section

Don’t forget: You need to implement the functions and remove the TODO markers! You can use uvx gatorgrade to check your progress and see which functions are working! It all works in your terminal window!

Step 4: Test your progress frequently

# run gatorgrade after making changes
uvx gatorgrade

# you should see your completion percentage improve
# keep working until you reach 100%

# or, for specific test details, you can also run:
uv run pytest -v
  • Run gatorgrade frequently to track your progress
  • Each change should improve your completion percentage
  • Don’t wait until the end to test your work
  • Green checkmarks confirm your code is working correctly
  • Reported score is your current score on the skill-check
  • Ask instructor if you get stuck or need assistance

Step 5: Submit your work with Git

# add your completed work to Git staging area
git add .

# create a commit with a descriptive message
git commit -m "Complete skill-check programming tasks"

# push your work to GitHub
git push origin main
  • Push frequently during the skill-check, not just at the end
  • Use descriptive commit messages that explain what you completed
  • GitHub Actions will automatically run additional tests on your code
  • Final score reported in GitHub Actions, matching local gatorgrade
  • Score improvements may occur each time to run git commit!

Avoid skill-check mistakes

  • Not reading instructions carefully: read the entire README.md
  • Forgetting to remove TODO markers: avoid automatic failures
  • Not running gatorgrade frequently: test your work as you go
  • Waiting until the last minute to push: commit and push regularly
  • Modifying test files: never change files in the tests/ directory
  • Not completing the Honor Code: you must digitally sign the pledge

Remember: Read carefully, code thoughtfully, test frequently, and submit regularly! If you get stuck, make sure to chat with the instructor!

Skill-check success checklist

Suggestions for ensuring the successful completion of a skill-check

Reminder of course goals

  • Document Creation:
    • Design and implement document generation workflows
    • Test all aspects of documents to ensure quality and accuracy
    • Create frameworks for automated document production
  • Document Analysis:
    • Collect and analyze data about document usage and quality
    • Visualize insights to improve documentation strategies
  • Communicate results and best practices for document engineering
  • Check syllabus for details about Document Engineering course!