Scanning & Validation

Extract webMCP elements from HTML files and URLs with intelligent scanning and comprehensive validation.

Scanning Capabilities

Powerful scanning engine that extracts and analyzes web elements

HTML File Scanning

Scan local HTML files and extract webMCP elements

webmcp scan login.html --output login.wmcp
webmcp scan forms/*.html --format json
webmcp scan index.html --verbose --url https://mysite.com

URL Scanning

Scan live websites and extract interactive elements

webmcp scan https://example.com/login
webmcp scan https://shop.example.com/checkout --format wmcp
webmcp scan https://app.example.com --output app-elements.json

Batch Processing

Process multiple files or URLs simultaneously

webmcp scan "src/**/*.html" --output-dir ./webmcp
webmcp scan urls.txt --batch --format json
webmcp scan forms/ --recursive --verbose

Element Detection

Comprehensive element detection with priority-based extraction

Form Elements

Input fields, buttons, selects, textareas

Selector: input, button, select, textarea, formHigh Priority

Extraction Details:

  • Element type and attributes
  • Form validation rules
  • Labels and placeholders
  • Semantic roles and purposes

Interactive Elements

Links, clickable elements, navigation

Selector: a[href], [onclick], [role="button"]Medium Priority

Extraction Details:

  • Link destinations and purposes
  • Click handlers and actions
  • Navigation structure
  • Interactive state information

Content Elements

Headers, paragraphs, lists, tables

Selector: h1, h2, h3, h4, h5, h6, p, ul, ol, tableLow Priority

Extraction Details:

  • Content hierarchy and structure
  • Text content and meaning
  • Data tables and relationships
  • List structures and ordering

Scanning Examples

Practical examples for different scanning scenarios

Basic HTML Scanning

Scan a simple login form

# Scan local HTML file
webmcp scan login.html --output login.wmcp --verbose

# Output will show:
# ✓ Found 3 form elements
# ✓ Found 1 button element  
# ✓ Generated login.wmcp with 4 webMCP elements
# ✓ Token optimization potential: 67.6%

URL Scanning with Context

Scan live website with URL context

# Scan live website
webmcp scan https://example.com/signup \
  --output signup.wmcp \
  --format wmcp \
  --verbose

# Include URL context for better optimization
webmcp scan signup.html \
  --url https://example.com/signup \
  --output signup-optimized.wmcp

Batch Processing

Process multiple files efficiently

# Process all HTML files in directory
webmcp scan "forms/**/*.html" \
  --output-dir ./webmcp-output \
  --format json \
  --recursive

# Process URLs from file
echo "https://example.com/login
https://example.com/signup  
https://example.com/contact" > urls.txt

webmcp scan urls.txt --batch --format wmcp

Advanced Configuration

Custom scanning with filters and options

# Scan with custom element filters
webmcp scan page.html \
  --elements "input,button,select" \
  --exclude-classes "hidden,disabled" \
  --min-priority medium \
  --output filtered.wmcp

# Scan with optimization preview
webmcp scan form.html \
  --preview-optimization \
  --target-model gpt-4o \
  --compression-level advanced

Validation Rules

Comprehensive validation ensures quality and compliance

Element Detection

Validates that interactive elements are properly identified

  • Form inputs have proper names and types
  • Buttons have descriptive text or labels
  • Links have meaningful href attributes
  • Interactive elements have semantic roles

Accessibility Compliance

Ensures elements meet accessibility standards

  • Form inputs have associated labels
  • Interactive elements have ARIA attributes
  • Focus management is properly configured
  • Screen reader compatibility is maintained

webMCP Schema

Validates output against webMCP schema

  • Required fields are present
  • Data types match schema definitions
  • Relationships between elements are valid
  • Security tokens are properly generated

Best Practices

Tips for effective scanning and validation

Include URL Context

Always provide URL context when scanning HTML files for better optimization results.

Use Verbose Output

Enable verbose mode to understand what elements are being detected and why.

Validate Immediately

Run validation immediately after scanning to catch issues early.

Large File Performance

For large files (&gt 1MB), consider using element filters to improve performance.

Batch Processing

Use batch processing for multiple files to take advantage of parallelization.

Preview Optimization

Use --preview-optimization to see potential token savings before generating output.

Ready to Start Scanning?

Begin extracting webMCP elements from your HTML files and websites