Scanning & Validation
Extract webMCP elements from HTML files and URLs with intelligent scanning and comprehensive validation.
Scanning Capabilities
Powerful scanning engine that extracts and analyzes web elements
HTML File Scanning
Scan local HTML files and extract webMCP elements
URL Scanning
Scan live websites and extract interactive elements
Batch Processing
Process multiple files or URLs simultaneously
Element Detection
Comprehensive element detection with priority-based extraction
Form Elements
Input fields, buttons, selects, textareas
input, button, select, textarea, formHigh PriorityExtraction Details:
- Element type and attributes
- Form validation rules
- Labels and placeholders
- Semantic roles and purposes
Interactive Elements
Links, clickable elements, navigation
a[href], [onclick], [role="button"]Medium PriorityExtraction Details:
- Link destinations and purposes
- Click handlers and actions
- Navigation structure
- Interactive state information
Content Elements
Headers, paragraphs, lists, tables
h1, h2, h3, h4, h5, h6, p, ul, ol, tableLow PriorityExtraction Details:
- Content hierarchy and structure
- Text content and meaning
- Data tables and relationships
- List structures and ordering
Scanning Examples
Practical examples for different scanning scenarios
Basic HTML Scanning
Scan a simple login form
# Scan local HTML file
webmcp scan login.html --output login.wmcp --verbose
# Output will show:
# ✓ Found 3 form elements
# ✓ Found 1 button element
# ✓ Generated login.wmcp with 4 webMCP elements
# ✓ Token optimization potential: 67.6%URL Scanning with Context
Scan live website with URL context
# Scan live website
webmcp scan https://example.com/signup \
--output signup.wmcp \
--format wmcp \
--verbose
# Include URL context for better optimization
webmcp scan signup.html \
--url https://example.com/signup \
--output signup-optimized.wmcpBatch Processing
Process multiple files efficiently
# Process all HTML files in directory
webmcp scan "forms/**/*.html" \
--output-dir ./webmcp-output \
--format json \
--recursive
# Process URLs from file
echo "https://example.com/login
https://example.com/signup
https://example.com/contact" > urls.txt
webmcp scan urls.txt --batch --format wmcpAdvanced Configuration
Custom scanning with filters and options
# Scan with custom element filters
webmcp scan page.html \
--elements "input,button,select" \
--exclude-classes "hidden,disabled" \
--min-priority medium \
--output filtered.wmcp
# Scan with optimization preview
webmcp scan form.html \
--preview-optimization \
--target-model gpt-4o \
--compression-level advancedValidation Rules
Comprehensive validation ensures quality and compliance
Element Detection
Validates that interactive elements are properly identified
- Form inputs have proper names and types
- Buttons have descriptive text or labels
- Links have meaningful href attributes
- Interactive elements have semantic roles
Accessibility Compliance
Ensures elements meet accessibility standards
- Form inputs have associated labels
- Interactive elements have ARIA attributes
- Focus management is properly configured
- Screen reader compatibility is maintained
webMCP Schema
Validates output against webMCP schema
- Required fields are present
- Data types match schema definitions
- Relationships between elements are valid
- Security tokens are properly generated
Best Practices
Tips for effective scanning and validation
Include URL Context
Always provide URL context when scanning HTML files for better optimization results.
Use Verbose Output
Enable verbose mode to understand what elements are being detected and why.
Validate Immediately
Run validation immediately after scanning to catch issues early.
Large File Performance
For large files (> 1MB), consider using element filters to improve performance.
Batch Processing
Use batch processing for multiple files to take advantage of parallelization.
Preview Optimization
Use --preview-optimization to see potential token savings before generating output.
Ready to Start Scanning?
Begin extracting webMCP elements from your HTML files and websites