AppSheet Parser Suite - Technical Documentation

Overview and Purpose

The AppSheet Parser Suite addresses a critical limitation in AppSheet development: while the platform's HTML documentation export contains comprehensive information about an application's structure, this information is difficult to navigate, search, and understand. This suite transforms that HTML export into structured CSV data that can be analyzed, queried, and used to identify unused elements. It provides AppSheet developers with insights that are otherwise difficult to obtain, particularly for large, complex applications that may have accumulated technical debt over years of development. By producing standardized CSV outputs and publishing the complete source code on GitHub, the suite provides a foundation that other developers can build upon - whether extending the parsing capabilities, creating new analysis tools, or integrating the data into their own workflows.

Key Design Achievements

1. Modular Architecture - The overwhelming complexity of parsing AppSheet's HTML documentation has been divided into 18+ specialized Python scripts, each handling a specific parsing or analysis task. This division makes debugging feasible and allows other developers to understand, modify, or extend individual components without grappling with the entire system.

2. Working Around HTML Deficiencies - AppSheet's HTML export lacks critical metadata: actions don't indicate whether they're system-generated or user-created, view categorization and navigation positions are unclear, and bot configurations are entirely absent. The suite includes interactive validation systems that guide users through obtaining this missing data from the AppSheet editor interface, creating supplementary text files that augment the incomplete HTML.

3. Self-Documenting CSV Design - Every parser produces CSV files with consistent field ordering and uses a triple-pipe delimiter (|||) for multi-value fields. This standardization enables both human analysis (the CSVs can be opened in Excel) and programmatic processing by downstream tools. The CSV format serves as both a final result and an intermediate data structure for further analysis.

The suite is particularly valuable for:

Authors: Kirk Masden & Claude
Language: Python 3.6+ (tested with Python 3.13.2)
Dependencies: BeautifulSoup4 for HTML parsing
Environment: Python virtual environment required

↑ Back to Table of Contents

Setup Instructions

Step 1: Download the Scripts

Download the AppSheet Parser Suite from GitHub: https://github.com/KirkMasden/appsheet_parser_and_orphan_detector

Click the green "Code" button and select "Download ZIP". This will download all the scripts in a single compressed file.

Step 2: Prepare the Required Files

Before organizing your folders, you need to obtain the HTML documentation from AppSheet and create several supporting files that provide information missing from the HTML export.

Downloading the Application Documentation HTML

These instructions are for the new AppSheet editor (for legacy editor, steps differ):

  1. Open your app in the AppSheet editor
  2. Click on "Settings" in the left sidebar
  3. Select "Information"
  4. Scroll to the bottom of the page to find "The documentation page for this app is available here"
  5. Click the link to open the documentation in a new browser tab
  6. In Chrome browser: Go to File → Save Page As...
  7. In the save dialog, select "Webpage, HTML Only" as the format
  8. Save the file without changing its name (it will be something like Application Documentation.html)

Important: Do not rename the file when saving. Keep the original name as provided by AppSheet. The file will be differentiated from other similar HTML files by the name of the folder you will put it in (see directions below).

Creating actions.txt

This file identifies which actions are system-generated versus user-created:

  1. Open your app in the AppSheet legacy editor
  2. Click on "Behavior" in the left sidebar
  3. Scroll to the bottom of the actions list
  4. Click "Show all" to reveal system-generated actions
  5. Select ALL text on the page (Ctrl+A on Windows, Cmd+A on Mac)
  6. Copy the text (Ctrl+C or Cmd+C)
  7. Paste into a new text file and save as actions.txt

Video Tutorial: Watch how to create actions.txt:

Creating views1.txt and views2.txt

These files help identify system views and navigation structure. You need to create TWO separate files:

For views1.txt:

  1. Open your app in the AppSheet legacy editor
  2. Click on "UX" then "Views" in the left sidebar
  3. Make sure "Show system views" is CHECKED (you should see system-generated views in the list)
  4. Select ALL text on the page (Ctrl+A or Cmd+A)
  5. Copy and paste into a new text file
  6. Save as views1.txt

For views2.txt:

  1. Stay on the same Views page
  2. Click to UNCHECK "Show system views" (system views should disappear from the list)
  3. Select ALL text on the page (Ctrl+A or Cmd+A)
  4. Copy and paste into a new text file
  5. Save as views2.txt

Creating bot_actions.txt (Optional)

If your app uses bots that trigger actions, create this file to prevent those actions from being incorrectly identified as orphans:

  1. Review your bot configurations in the AppSheet editor under "Automation" → "Bots"
  2. Note which actions are called by your bots
  3. Create a text file with one action name per line
  4. Save as bot_actions.txt

Example bot_actions.txt:

Send_Email_Notification
Update_Status
Archive_Old_Records

Step 3: Organize Your Folders

Now that you have your text files ready, create a main project folder on your computer (for example: AppSheetAnalysis). Inside this folder:

  1. Extract the downloaded ZIP file - This will create a folder named appsheet_parser_and_orphan_detector-main. Move this folder into your main project folder.
  2. Create a data folder (for example: MyApp_Data) and place it in your main project folder. Put the following files in this data folder:

Your folder structure should now look like this:

AppSheetAnalysis/
├── appsheet_parser_and_orphan_detector-main/ (the extracted scripts)
└── MyApp_Data/ (your data folder)
    ├── Application Documentation.html
    ├── actions.txt
    ├── views1.txt
    ├── views2.txt
    └── bot_actions.txt (optional)

Step 3: Set Up Python Environment

Open a terminal (Mac/Linux) or Command Prompt (Windows). Navigate to your main project folder using the cd command:

Mac/Linux:

# Change to your project directory (replace with your actual path)
cd ~/Documents/AppSheetAnalysis

# Create a Python virtual environment
python3 -m venv venv

# Activate the virtual environment
source venv/bin/activate

# Install the required package
pip install beautifulsoup4

Windows:

# Change to your project directory (replace with your actual path)
cd C:\Users\YourName\Documents\AppSheetAnalysis

# Create a Python virtual environment
python -m venv venv

# Activate the virtual environment
venv\Scripts\activate

# Install the required package
pip install beautifulsoup4

Note: The cd command means "change directory". Replace the example paths with the actual location where you created your project folder.

After setting up the Python environment, your folder structure will have three folders:

AppSheetAnalysis/
├── appsheet_parser_and_orphan_detector-main/ (the Python scripts)
├── MyApp_Data/ (your data folder)
└── venv/ (the Python virtual environment - created automatically)

Step 4: Run the Parser

Follow these steps each time you want to analyze your AppSheet data:

Mac/Linux:

# 1. Navigate to your project folder
cd ~/Documents/AppSheetAnalysis

# 2. Activate the virtual environment
source venv/bin/activate

# 3. Run the parser (replace filenames with your actual names)
python appsheet_parser_and_orphan_detector-main/master_parser_and_orphan_detector.py "MyApp_Data/Application Documentation.html"

Windows:

# 1. Navigate to your project folder
cd C:\Users\YourName\Documents\AppSheetAnalysis

# 2. Activate the virtual environment
venv\Scripts\activate

# 3. Run the parser (replace filenames with your actual names)
python appsheet_parser_and_orphan_detector-main\master_parser_and_orphan_detector.py "MyApp_Data\Application Documentation.html"

The script will:

  1. Parse your HTML file and text files (see sample output)
  2. Create a new timestamped output folder (like 20250107_143022_YourApp_parse)
  3. Generate CSV files for each component type
  4. Run orphan detection automatically
  5. Ask if you want to explore dependencies interactively

Important: Always run these three steps in order. The virtual environment must be activated (you'll see (venv) in your terminal prompt) before running the Python scripts.

Step 5: Explore Results (Optional)

When parsing completes, the script asks "Would you like to explore dependencies now?" If you answer "n", the script displays a command you can copy and use later. This command includes the full paths specific to your setup, so it will work from any directory after you activate your virtual environment.

Tips for Success

↑ Back to Table of Contents

Architecture Overview

The suite consists of 18+ Python scripts organized into four categories:

  1. Core Infrastructure - Master orchestrator and shared base class
  2. Parser Scripts - Extract data from HTML into CSV files (Phases 1-7)
  3. Orphan Detectors - Identify unused components (Phases 8-13)
  4. Dependency Analyzers - Interactive tools for exploring relationships (Phase 14)

The system processes data in strict dependency order, with each phase potentially using outputs from previous phases. For example, the slice parser must run first because other parsers need the slice-to-table mappings it generates.

Why External Text Files Are Required

AppSheet's HTML documentation export, while comprehensive, lacks certain critical metadata that must be obtained through supplementary text files:

These text files compensate for deficiencies in AppSheet's HTML export and enable accurate orphan detection and component categorization.

Architectural Decision: Separated Navigation Processing

The navigation analysis is deliberately split into two phases:

This separation provides:

↑ Back to Table of Contents

Master Orchestrator: master_parser_and_orphan_detector.py

Purpose: Coordinates the entire workflow, ensuring scripts run in the correct order with proper data dependencies satisfied.

Key Responsibilities

Usage Examples

# Full parse with orphan detection
python master_parser_and_orphan_detector.py "Application Documentation.html" -o output_dir

# Single parser mode
python master_parser_and_orphan_detector.py "Application Documentation.html" -v # views only

# Orphan detection only (on existing parsed data)
python master_parser_and_orphan_detector.py "Application Documentation.html" -ao # action orphans

Execution Flow

  1. Validates HTML file exists
  2. Creates timestamped output directory
  3. Executes parsers in dependency order (Phases 1-7)
  4. Automatically runs orphan detectors (Phases 8-13)
  5. Generates summary statistics
  6. Prompts to launch Dependency Analyzer Hub

Error Handling

Each phase is wrapped in try/except blocks to prevent cascade failures. The orchestrator logs errors but continues with remaining phases, returning partial results even if some parsers fail.

Output Summary Generation

After completion, reads all generated CSV files to provide:

↑ Back to Table of Contents

Shared Foundation: base_parser.py

Purpose: Abstract base class that provides common functionality for all parser scripts, ensuring consistent reference extraction, HTML handling, and data normalization across the suite.

Core Capabilities

HTML Loading and Initialization

Reference Extraction Engine

The reference extraction is the heart of the parser suite. It identifies where columns are referenced throughout the application:

Pattern Detection:

Slice Resolution:

JSON Configuration Parsing:

Design Decision: Triple-Pipe Delimiter

The suite uses ||| as the delimiter for multi-value fields throughout all CSV outputs. This is critical because AppSheet allows commas in component names, formulas, and column lists, which makes comma-based splitting unreliable. The triple-pipe pattern was chosen as it's extremely unlikely to appear in AppSheet formulas or user-entered text.

Key Methods for Child Parsers

extract_references_from_text(text, context_table=None)
Returns list of reference dictionaries containing type, table, column, and metadata.

extract_references_from_json(json_str, context_table=None)
Parses JSON and recursively extracts references with path tracking.

resolve_table_reference(table_or_slice_name)
Core slice-to-table resolution logic with case-insensitive matching.

build_absolute_references(references)
Converts all references to Table[Column] format for consistency.

Abstract Method Contract

All child parsers must implement a parse() method that returns a list of dictionaries representing parsed components.

↑ Back to Table of Contents

Phase 1: Parsing Tables and Slices (slice_parser.py)

Purpose: First parser in the chain that extracts slice definitions and creates the critical slice-to-table mapping used by all subsequent parsers. Also performs a complete inventory of all tables in the application.

Key Capabilities

Complete Table Discovery

Unlike other parsers, slice_parser performs a comprehensive table inventory:

Slice Data Extraction

For each slice, extracts:

Two-Pass Reference Resolution

  1. First pass: Extracts all slice data and builds slice-to-table map
  2. Second pass: Re-processes all formulas using the complete mapping to resolve slice references to actual tables

The two-pass strategy ensures that slice formulas are processed only after the complete slice_to_table_map is constructed. This is essential for resolving expressions that contain nested or indirect references to other slices.

Critical Integration Points

The slice_to_table_map dictionary is the most important output, enabling all downstream parsers to:

Output Format

The parser generates appsheet_slices.csv with fields including slice_name, source_table, referenced_columns, row_filter_condition, slice_columns, slice_actions, and more. Multi-value fields use the ||| delimiter.

↑ Back to Table of Contents

Phase 2: Parsing Columns (column_parser.py)

Purpose: Extracts all column definitions from the HTML schema sections, properly associates them with tables (resolving slice references), and identifies system-generated tables to exclude from analysis.

Dependencies

Key Capabilities

System Table Detection

Advanced Reference Extraction

The parser performs multi-level reference extraction from:

Virtual Column Detection

Hidden Column Detection

Identifies columns that are always hidden:

Output Format

The parser generates appsheet_columns.csv with comprehensive column metadata including table_name, column_name, unique_identifier, is_virtual, type, referenced_columns, and all formula fields.

↑ Back to Table of Contents

Phase 3: Parsing Format Rules (format_rules_parser.py)

Purpose: Extracts format rules (conditional formatting) that control the visual appearance of columns and actions based on conditions. Tracks which columns are referenced in rule conditions.

Key Capabilities

Formatted Item Parsing

Separates columns from actions in the formatted items list:

Settings Extraction

Converts JSON settings to human-readable format, including:

Reference Extraction

Design Decision

The parser only extracts column references from the condition field, not from the list of formatted items. This is intentional because formatted items are targets (what gets styled), not dependencies. Only the condition determines when formatting applies.

↑ Back to Table of Contents

Phase 4: Parsing Actions (actions_parser.py)

Purpose: Extracts all actions from the HTML, determines system-generated vs user-created status using the actions.txt helper file, and tracks action dependencies including grouped action sequences.

Interactive Actions.txt Validation

The parser provides step-by-step guidance when actions.txt is missing:

  1. In the legacy mode of the AppSheet editor:
  2. Click on 'Behavior' in the left sidebar
  3. Scroll to the bottom of the page
  4. Click on 'Show all' to display system actions
  5. Select ALL text on the page (Ctrl+A / Cmd+A)
  6. Copy and paste into a file named 'actions.txt'

Key Capabilities

Actions.txt Parsing

Handles two formats:

Advanced Action Type Detection

Primary method using JSON structure analysis to identify:

Grouped Action Dependency Tracking

Extracts referenced actions from grouped actions and maintains action_dependencies dictionary for chain analysis.

System Status Assignment

Three-tier matching strategy:

  1. Compound key with table: Table||ActionName||SYSTEM
  2. Direct normalized name match
  3. Fuzzy matching with space normalization

Output Format

The parser generates appsheet_actions.csv with fields including action_name, source_table, action_type, referenced_columns, referenced_actions, navigate_target, and is_system_generated status.

↑ Back to Table of Contents

Phase 5: Parsing Views (views_parser.py)

Purpose: Extracts all view definitions from the HTML, enriches them with data from external views text files, and tracks which columns and actions each view displays and references.

Interactive Views Text File Validation

The parser provides comprehensive guidance for obtaining required view mapping files:

  1. Navigate to UX > Views in AppSheet legacy editor
  2. Copy all text with "Show system views" checked → save as views1.txt
  3. Copy all text with "Show system views" unchecked → save as views2.txt

Key Capabilities

Automatic System View Detection

Compares file sizes to identify which contains system views:

Multi-Source Data Loading

The parser loads data from multiple Phase outputs to enrich view information:

Available vs Displayed Tracking

Critical distinction between what's accessible and what's shown:

Output Format

The parser generates appsheet_views.csv with comprehensive view metadata including view_name, view_type, category, data_source, available_columns, view_columns, available_actions, referenced_actions, and more.

↑ Back to Table of Contents

Phase 6: Navigation Target Parsing (action_target_parser.py)

Purpose: Parses navigation expressions from actions to extract structured navigation targets, preparing data for edge generation by identifying where each action navigates to and under what conditions.

Key Capabilities

Navigation Expression Parsing

The parser handles multiple navigation patterns:

  1. Direct Navigation: #control=ViewName in URLs
  2. LINKTOVIEW Expressions: Both quoted and unquoted formats
  3. LINKTOROW Expressions: Tracks parentheses depth to correctly split parameters
  4. Conditional Navigation: IF/IFS expressions with context conditions

Context Condition Extraction

Identifies where actions should be visible/active:

Two-Layer Context Handling

The parser merges conditions from two sources:

  1. Action-level: From only_if_condition field (when action is visible)
  2. Expression-level: From IF/IFS branches (where to navigate)

Output Files

↑ Back to Table of Contents

Phase 7: Navigation Edges Generation (navigation_edge_generator.py)

Purpose: Generates navigation edges from parsed action targets, determining which views can navigate to which other views by validating action visibility, context conditions, and expanding group actions.

Key Capabilities

Action Visibility Validation

The generator implements view-type-specific visibility rules:

Group Action Expansion

Recursively expands group actions with condition inheritance:

Context Condition Validation

Validates six types of context requirements:

Output Format

The generator creates navigation_edges.csv containing all valid navigation paths with comprehensive metadata including source_view, target_view, source_action, context conditions, and availability type.

↑ Back to Table of Contents

Understanding Orphan Detection

Orphan detection identifies components that appear unused, but there are two important concepts to understand when interpreting results.

View Reachability Affects All Detection

The suite first identifies views that are unreachable from the app's root views, that is, views in the primary navigation bar or side menu that users can access directly (Phase 8). This matters because actions, columns, and format rules that appear only on unreachable views will be flagged as potential orphans in subsequent phases.

This is intentional: if users cannot navigate to a view, they cannot access what's on it. However, if you're still building your app and haven't connected all views to the menu yet, components on those unconnected views will appear as orphans.

Tip: Check the contents of "potential_view_orphans.csv" first. If a view you're still developing appears as an orphan, the actions and columns in that view may also be flagged if they are not used in an accessible view.

Detection is Iterative

The suite does not automatically flag "child" components of orphaned items. For example:

This conservative approach prevents cascading false positives and lets you review each level of dependencies independently. Such children should be properly identified (if not used elsewhere) after the parent has been deleted and you run the scripts again on the new data.

Recommendation: After removing confirmed orphans, run the analysis again. Components that were previously kept alive only by the deleted orphans may now be detected.

↑ Back to Table of Contents

Phase 8: View Orphan Detection (view_orphan_detector.py)

Purpose: Identifies user-created views that are not reachable from any root view (primary navigation or menu) and system views that are unused, using navigation edge data from Phase 7.

Key Capabilities

Navigation Graph Construction

Builds reachability graph from pre-computed navigation edges, using canonical name resolution for case-insensitive matching.

Root View Identification

Determines application entry points:

Breadth-First Search Reachability

Traverses navigation graph to find all reachable views:

Output Files

↑ Back to Table of Contents

Phase 9: Virtual Column Orphan Detection (column_orphan_detector.py)

Purpose: Identifies virtual columns that are not referenced by any other components, filtering out system-generated REF_ROWS columns and label columns that are actively displayed.

Key Capabilities

System-Generated Column Detection

Identifies and excludes AppSheet-created reverse references:

Label Column Intelligence

Sophisticated logic for label columns used in UI:

Multi-Source Reference Counting

Checks references across all component types:

Output Format

The detector generates potential_virtual_column_orphans.csv preserving all original column fields plus reference counts from each component type.

↑ Back to Table of Contents

Phase 10: Action Orphan Detection (actions_orphan_detector.py)

Purpose: Identifies user-created actions that are not referenced by other actions, not used in views, and critically detects actions that are unreachable due to being placed after unconditional navigation in group action sequences.

Key Capabilities

Unreachable Action Detection

Revolutionary feature that identifies actions placed after navigation in sequences:

Bot Action Exclusion

Searches multiple locations for bot-referenced actions:

The bot_actions.txt file is manually created with one action name per line, listing actions invoked by bots.

View-Type-Specific Visibility

Complex visibility rules per view type:

Output Format

The detector generates potential_action_orphans.csv with special notes for unreachable actions: "UNREACHABLE - Remove from: Group1, Group2 before deleting"

↑ Back to Table of Contents

Phase 11: Format Rule Orphan Detection (format_rules_orphan_detector.py)

Purpose: Identifies format rules that may be orphaned or unused, checking whether the columns and actions they format actually exist and are visible in the application.

Key Capabilities

Multi-Level Validation

The detector validates format rules at multiple levels:

Slice-Aware Checking

When a format rule targets a slice:

Orphan Classification

A format rule is considered orphaned if:

Integration with Other Phases

The detector optionally loads:

Output Format

The detector generates potential_format_rule_orphans.csv containing all original format rule fields plus orphan status and formatted item counts.

↑ Back to Table of Contents

Phase 12: Slice Orphan Detection (slice_orphan_detector.py)

Purpose: Identifies slices that are not referenced anywhere in the application, helping developers clean up unused data layer components that may impact performance and maintainability.

Key Capabilities

Comprehensive Reference Search

Unlike simple name matching, the detector searches for slices in:

Formula Reference Detection

Sophisticated pattern matching for slice references in AppSheet formulas:

Multi-Component Scanning

Checks for slice references in:

Output Format

The detector generates potential_slice_orphans.csv preserving all original slice fields plus orphan status and reference count.

↑ Back to Table of Contents

Phase 13: Phantom View Reference Detection (phantom_view_reference_detector.py)

Purpose: Identifies references to non-existent views in AppSheet formulas across all application components, catching configuration errors that would cause runtime failures when users trigger navigation actions.

Key Capabilities

Unicode Normalization System

Handles international characters and smart quotes that often cause phantom references:

Comprehensive Reference Extraction

Sophisticated regex patterns for all AppSheet navigation functions:

Hybrid Detection Strategy

Uses pre-parsed action_targets.csv when available for accuracy, falls back to regex parsing for flexibility.

Debugging Value

The Phantom View Reference Detection script is also useful for debugging the action_target_parser script. In the development process, looking at phantom detection results can help identify parsing failures. Even if someone isn't particularly interested in removing phantom view destinations, the results can reveal parsing issues.

Output Format

The detector generates potential_phantom_view_references.csv with frequency analysis showing which phantom views appear most often.

↑ Back to Table of Contents

Phase 14: Dependency Exploration

Phase 14 provides interactive tools for exploring complex dependencies after all parsing and orphan detection completes.

14-1: Dependency Analysis Hub (dependency_analyzer_hub.py)

Purpose: Provides a centralized menu interface for accessing three specialized dependency analyzers, enabling deep exploration of component relationships within the AppSheet application.

Key Features

Menu Options

  1. Column dependencies
  2. Action dependencies
  3. View dependencies
  4. Exit
↑ Back to Table of Contents

14-2: Column Dependency Analyzer (column_dependency_analyzer.py)

Purpose: Interactive analyzer that traces all dependencies for a selected column across the entire AppSheet application, showing which components reference or depend on it through various mechanisms.

Key Capabilities

Flexible Column Search

Reference Categorization

Identifies HOW a column references the target across multiple formula fields:

Multi-Component Analysis

Analyzes column usage in:

Interactive Navigation

Three-tier menu system:

  1. Search for column
  2. View dependency summary
  3. Explore detailed dependencies by type
↑ Back to Table of Contents

14-3: Action Dependency Analyzer (action_dependency_analyzer.py)

Purpose: Interactive analyzer for exploring action dependencies, relationships, and invocation chains throughout the AppSheet application.

Navigation Methods

  1. Search by name (partial match)
  2. Browse by table
  3. Browse by action type
  4. Browse all actions
  5. Analyze action chains

Key Capabilities

Action Chain Analysis

Ancestry Tracing

Discovers all paths that lead to a target action:

View Integration Analysis

Sophisticated visibility detection:

↑ Back to Table of Contents

14-4: View Path Analyzer (view_dependency_analyzer.py)

Purpose: Interactive analyzer that traces navigation paths TO selected views from entry points and shows immediate destinations FROM them.

Key Capabilities

Bidirectional Navigation Analysis

Entry Point Identification

Categorizes views by accessibility:

Path Finding Algorithm

Breadth-first search with:

Output Display

Shows navigation paths in tree format with contextual information about how navigation occurs (action names, event types, prominence settings).

↑ Back to Table of Contents