The AppSheet Parser Suite addresses a critical limitation in AppSheet development: while the platform's HTML documentation export contains comprehensive information about an application's structure, this information is difficult to navigate, search, and understand. This suite transforms that HTML export into structured CSV data that can be analyzed, queried, and used to identify unused elements. It provides AppSheet developers with insights that are otherwise difficult to obtain, particularly for large, complex applications that may have accumulated technical debt over years of development. By producing standardized CSV outputs and publishing the complete source code on GitHub, the suite provides a foundation that other developers can build upon - whether extending the parsing capabilities, creating new analysis tools, or integrating the data into their own workflows.
1. Modular Architecture - The overwhelming complexity of parsing AppSheet's HTML documentation has been divided into 18+ specialized Python scripts, each handling a specific parsing or analysis task. This division makes debugging feasible and allows other developers to understand, modify, or extend individual components without grappling with the entire system.
2. Working Around HTML Deficiencies - AppSheet's HTML export lacks critical metadata: actions don't indicate whether they're system-generated or user-created, view categorization and navigation positions are unclear, and bot configurations are entirely absent. The suite includes interactive validation systems that guide users through obtaining this missing data from the AppSheet editor interface, creating supplementary text files that augment the incomplete HTML.
3. Self-Documenting CSV Design - Every parser produces CSV files with consistent field ordering and uses a triple-pipe delimiter (|||) for multi-value fields. This standardization enables both human analysis (the CSVs can be opened in Excel) and programmatic processing by downstream tools. The CSV format serves as both a final result and an intermediate data structure for further analysis.
The suite is particularly valuable for:
Authors: Kirk Masden & Claude
Language: Python 3.6+ (tested with Python 3.13.2)
Dependencies: BeautifulSoup4 for HTML parsing
Environment: Python virtual environment required
Download the AppSheet Parser Suite from GitHub: https://github.com/KirkMasden/appsheet_parser_and_orphan_detector
Click the green "Code" button and select "Download ZIP". This will download all the scripts in a single compressed file.
Before organizing your folders, you need to obtain the HTML documentation from AppSheet and create several supporting files that provide information missing from the HTML export.
These instructions are for the new AppSheet editor (for legacy editor, steps differ):
Application Documentation.html)Important: Do not rename the file when saving. Keep the original name as provided by AppSheet. The file will be differentiated from other similar HTML files by the name of the folder you will put it in (see directions below).
This file identifies which actions are system-generated versus user-created:
actions.txtVideo Tutorial: Watch how to create actions.txt:
These files help identify system views and navigation structure. You need to create TWO separate files:
For views1.txt:
views1.txtFor views2.txt:
views2.txtIf your app uses bots that trigger actions, create this file to prevent those actions from being incorrectly identified as orphans:
bot_actions.txtExample bot_actions.txt:
Send_Email_Notification
Update_Status
Archive_Old_Records
Now that you have your text files ready, create a main project folder on your computer (for example: AppSheetAnalysis). Inside this folder:
appsheet_parser_and_orphan_detector-main. Move this folder into your main project folder.MyApp_Data) and place it in your main project folder. Put the following files in this data folder:
actions.txt, views1.txt, views2.txtbot_actions.txtYour folder structure should now look like this:
AppSheetAnalysis/
├── appsheet_parser_and_orphan_detector-main/ (the extracted scripts)
└── MyApp_Data/ (your data folder)
├── Application Documentation.html
├── actions.txt
├── views1.txt
├── views2.txt
└── bot_actions.txt (optional)
Open a terminal (Mac/Linux) or Command Prompt (Windows). Navigate to your main project folder using the cd command:
# Change to your project directory (replace with your actual path)
cd ~/Documents/AppSheetAnalysis# Create a Python virtual environment
python3 -m venv venv# Activate the virtual environment
source venv/bin/activate# Install the required package
pip install beautifulsoup4
# Change to your project directory (replace with your actual path)
cd C:\Users\YourName\Documents\AppSheetAnalysis# Create a Python virtual environment
python -m venv venv# Activate the virtual environment
venv\Scripts\activate# Install the required package
pip install beautifulsoup4
Note: The cd command means "change directory". Replace the example paths with the actual location where you created your project folder.
After setting up the Python environment, your folder structure will have three folders:
AppSheetAnalysis/
├── appsheet_parser_and_orphan_detector-main/ (the Python scripts)
├── MyApp_Data/ (your data folder)
└── venv/ (the Python virtual environment - created automatically)
Follow these steps each time you want to analyze your AppSheet data:
# 1. Navigate to your project folder
cd ~/Documents/AppSheetAnalysis# 2. Activate the virtual environment
source venv/bin/activate# 3. Run the parser (replace filenames with your actual names)
python appsheet_parser_and_orphan_detector-main/master_parser_and_orphan_detector.py "MyApp_Data/Application Documentation.html"
# 1. Navigate to your project folder
cd C:\Users\YourName\Documents\AppSheetAnalysis# 2. Activate the virtual environment
venv\Scripts\activate# 3. Run the parser (replace filenames with your actual names)
python appsheet_parser_and_orphan_detector-main\master_parser_and_orphan_detector.py "MyApp_Data\Application Documentation.html"
The script will:
20250107_143022_YourApp_parse)Important: Always run these three steps in order. The virtual environment must be activated (you'll see (venv) in your terminal prompt) before running the Python scripts.
When parsing completes, the script asks "Would you like to explore dependencies now?" If you answer "n", the script displays a command you can copy and use later. This command includes the full paths specific to your setup, so it will work from any directory after you activate your virtual environment.
source venv/bin/activate or venv\Scripts\activate) before running scripts(venv) when activateddeactivateThe suite consists of 18+ Python scripts organized into four categories:
The system processes data in strict dependency order, with each phase potentially using outputs from previous phases. For example, the slice parser must run first because other parsers need the slice-to-table mappings it generates.
AppSheet's HTML documentation export, while comprehensive, lacks certain critical metadata that must be obtained through supplementary text files:
These text files compensate for deficiencies in AppSheet's HTML export and enable accurate orphan detection and component categorization.
The navigation analysis is deliberately split into two phases:
action_targets.csvThis separation provides:
Purpose: Coordinates the entire workflow, ensuring scripts run in the correct order with proper data dependencies satisfied.
# Full parse with orphan detection
python master_parser_and_orphan_detector.py "Application Documentation.html" -o output_dir# Single parser mode
python master_parser_and_orphan_detector.py "Application Documentation.html" -v # views only# Orphan detection only (on existing parsed data)
python master_parser_and_orphan_detector.py "Application Documentation.html" -ao # action orphans
Each phase is wrapped in try/except blocks to prevent cascade failures. The orchestrator logs errors but continues with remaining phases, returning partial results even if some parsers fail.
After completion, reads all generated CSV files to provide:
Purpose: Abstract base class that provides common functionality for all parser scripts, ensuring consistent reference extraction, HTML handling, and data normalization across the suite.
The reference extraction is the heart of the parser suite. It identifies where columns are referenced throughout the application:
Pattern Detection:
Slice Resolution:
JSON Configuration Parsing:
The suite uses ||| as the delimiter for multi-value fields throughout all CSV outputs. This is critical because AppSheet allows commas in component names, formulas, and column lists, which makes comma-based splitting unreliable. The triple-pipe pattern was chosen as it's extremely unlikely to appear in AppSheet formulas or user-entered text.
extract_references_from_text(text, context_table=None)
Returns list of reference dictionaries containing type, table, column, and metadata.
extract_references_from_json(json_str, context_table=None)
Parses JSON and recursively extracts references with path tracking.
resolve_table_reference(table_or_slice_name)
Core slice-to-table resolution logic with case-insensitive matching.
build_absolute_references(references)
Converts all references to Table[Column] format for consistency.
All child parsers must implement a parse() method that returns a list of dictionaries representing parsed components.
Purpose: First parser in the chain that extracts slice definitions and creates the critical slice-to-table mapping used by all subsequent parsers. Also performs a complete inventory of all tables in the application.
Unlike other parsers, slice_parser performs a comprehensive table inventory:
For each slice, extracts:
The two-pass strategy ensures that slice formulas are processed only after the complete slice_to_table_map is constructed. This is essential for resolving expressions that contain nested or indirect references to other slices.
The slice_to_table_map dictionary is the most important output, enabling all downstream parsers to:
The parser generates appsheet_slices.csv with fields including slice_name, source_table, referenced_columns, row_filter_condition, slice_columns, slice_actions, and more. Multi-value fields use the ||| delimiter.
Purpose: Extracts all column definitions from the HTML schema sections, properly associates them with tables (resolving slice references), and identifies system-generated tables to exclude from analysis.
The parser performs multi-level reference extraction from:
Identifies columns that are always hidden:
The parser generates appsheet_columns.csv with comprehensive column metadata including table_name, column_name, unique_identifier, is_virtual, type, referenced_columns, and all formula fields.
Purpose: Extracts format rules (conditional formatting) that control the visual appearance of columns and actions based on conditions. Tracks which columns are referenced in rule conditions.
Separates columns from actions in the formatted items list:
Converts JSON settings to human-readable format, including:
The parser only extracts column references from the condition field, not from the list of formatted items. This is intentional because formatted items are targets (what gets styled), not dependencies. Only the condition determines when formatting applies.
Purpose: Extracts all actions from the HTML, determines system-generated vs user-created status using the actions.txt helper file, and tracks action dependencies including grouped action sequences.
The parser provides step-by-step guidance when actions.txt is missing:
Handles two formats:
Primary method using JSON structure analysis to identify:
Extracts referenced actions from grouped actions and maintains action_dependencies dictionary for chain analysis.
Three-tier matching strategy:
The parser generates appsheet_actions.csv with fields including action_name, source_table, action_type, referenced_columns, referenced_actions, navigate_target, and is_system_generated status.
Purpose: Extracts all view definitions from the HTML, enriches them with data from external views text files, and tracks which columns and actions each view displays and references.
The parser provides comprehensive guidance for obtaining required view mapping files:
Compares file sizes to identify which contains system views:
The parser loads data from multiple Phase outputs to enrich view information:
Critical distinction between what's accessible and what's shown:
The parser generates appsheet_views.csv with comprehensive view metadata including view_name, view_type, category, data_source, available_columns, view_columns, available_actions, referenced_actions, and more.
Purpose: Parses navigation expressions from actions to extract structured navigation targets, preparing data for edge generation by identifying where each action navigates to and under what conditions.
The parser handles multiple navigation patterns:
Identifies where actions should be visible/active:
The parser merges conditions from two sources:
Purpose: Generates navigation edges from parsed action targets, determining which views can navigate to which other views by validating action visibility, context conditions, and expanding group actions.
The generator implements view-type-specific visibility rules:
Recursively expands group actions with condition inheritance:
Validates six types of context requirements:
The generator creates navigation_edges.csv containing all valid navigation paths with comprehensive metadata including source_view, target_view, source_action, context conditions, and availability type.
Orphan detection identifies components that appear unused, but there are two important concepts to understand when interpreting results.
The suite first identifies views that are unreachable from the app's root views, that is, views in the primary navigation bar or side menu that users can access directly (Phase 8). This matters because actions, columns, and format rules that appear only on unreachable views will be flagged as potential orphans in subsequent phases.
This is intentional: if users cannot navigate to a view, they cannot access what's on it. However, if you're still building your app and haven't connected all views to the menu yet, components on those unconnected views will appear as orphans.
Tip: Check the contents of "potential_view_orphans.csv" first. If a view you're still developing appears as an orphan, the actions and columns in that view may also be flagged if they are not used in an accessible view.
The suite does not automatically flag "child" components of orphaned items. For example:
This conservative approach prevents cascading false positives and lets you review each level of dependencies independently. Such children should be properly identified (if not used elsewhere) after the parent has been deleted and you run the scripts again on the new data.
Recommendation: After removing confirmed orphans, run the analysis again. Components that were previously kept alive only by the deleted orphans may now be detected.
Purpose: Identifies user-created views that are not reachable from any root view (primary navigation or menu) and system views that are unused, using navigation edge data from Phase 7.
Builds reachability graph from pre-computed navigation edges, using canonical name resolution for case-insensitive matching.
Determines application entry points:
Traverses navigation graph to find all reachable views:
Purpose: Identifies virtual columns that are not referenced by any other components, filtering out system-generated REF_ROWS columns and label columns that are actively displayed.
Identifies and excludes AppSheet-created reverse references:
Sophisticated logic for label columns used in UI:
Checks references across all component types:
The detector generates potential_virtual_column_orphans.csv preserving all original column fields plus reference counts from each component type.
Purpose: Identifies user-created actions that are not referenced by other actions, not used in views, and critically detects actions that are unreachable due to being placed after unconditional navigation in group action sequences.
Revolutionary feature that identifies actions placed after navigation in sequences:
Searches multiple locations for bot-referenced actions:
The bot_actions.txt file is manually created with one action name per line, listing actions invoked by bots.
Complex visibility rules per view type:
The detector generates potential_action_orphans.csv with special notes for unreachable actions: "UNREACHABLE - Remove from: Group1, Group2 before deleting"
Purpose: Identifies format rules that may be orphaned or unused, checking whether the columns and actions they format actually exist and are visible in the application.
The detector validates format rules at multiple levels:
When a format rule targets a slice:
A format rule is considered orphaned if:
The detector optionally loads:
potential_view_orphans.csv from Phase 8 to exclude orphaned viewsunused_system_views.csv from Phase 8 to exclude unused system viewsThe detector generates potential_format_rule_orphans.csv containing all original format rule fields plus orphan status and formatted item counts.
Purpose: Identifies slices that are not referenced anywhere in the application, helping developers clean up unused data layer components that may impact performance and maintainability.
Unlike simple name matching, the detector searches for slices in:
Sophisticated pattern matching for slice references in AppSheet formulas:
Checks for slice references in:
The detector generates potential_slice_orphans.csv preserving all original slice fields plus orphan status and reference count.
Purpose: Identifies references to non-existent views in AppSheet formulas across all application components, catching configuration errors that would cause runtime failures when users trigger navigation actions.
Handles international characters and smart quotes that often cause phantom references:
Sophisticated regex patterns for all AppSheet navigation functions:
Uses pre-parsed action_targets.csv when available for accuracy, falls back to regex parsing for flexibility.
The Phantom View Reference Detection script is also useful for debugging the action_target_parser script. In the development process, looking at phantom detection results can help identify parsing failures. Even if someone isn't particularly interested in removing phantom view destinations, the results can reveal parsing issues.
The detector generates potential_phantom_view_references.csv with frequency analysis showing which phantom views appear most often.
Phase 14 provides interactive tools for exploring complex dependencies after all parsing and orphan detection completes.
Purpose: Provides a centralized menu interface for accessing three specialized dependency analyzers, enabling deep exploration of component relationships within the AppSheet application.
Purpose: Interactive analyzer that traces all dependencies for a selected column across the entire AppSheet application, showing which components reference or depend on it through various mechanisms.
Identifies HOW a column references the target across multiple formula fields:
Analyzes column usage in:
Three-tier menu system:
Purpose: Interactive analyzer for exploring action dependencies, relationships, and invocation chains throughout the AppSheet application.
Discovers all paths that lead to a target action:
Sophisticated visibility detection:
Purpose: Interactive analyzer that traces navigation paths TO selected views from entry points and shows immediate destinations FROM them.
Categorizes views by accessibility:
Breadth-first search with:
Shows navigation paths in tree format with contextual information about how navigation occurs (action names, event types, prominence settings).