XOBuilder Extractor
A sophisticated web application that converts HTML and CSS to JSON layout through an intelligent multi-step flow process with AI validation.
🔥 Overview
XOBuilder Extractor is an advanced web application designed to transform HTML and CSS code into structured JSON layouts through a visual, multi-step processing flow. The application leverages ReactFlow for interactive UI representation and integrates with OpenAI for intelligent content conversion and validation.
🏗️ Architecture
The project follows Clean Architecture principles with clear separation of concerns:
src/
├── application/ # Business logic and use cases
│ ├── dto/ # Data Transfer Objects
│ └── use-case/ # Application use cases
├── domains/ # Core business entities and models
│ └── entities/ # Domain entities
├── utils/ # Utility functions and helpers
├── constants/ # Application constants
└── index.ts # Main entry point and exported functions
🎯 Core Features
📊 Visual Flow Processing
- Interactive Flow Visualization: Built with ReactFlow for intuitive process monitoring
- Real-time Status Updates: Visual feedback with animated borders and progress indicators
- Step-by-Step Execution: Clear progression through processing stages
🤖 AI-Powered Conversion
- OpenAI Integration: Intelligent HTML/CSS to JSON conversion
- Validation Loop: Automatic comparison and re-generation for accuracy
- Smart Retry Logic: Up to 5 automatic retries with improvement iterations
🎨 Advanced UI States
- Processing State: Animated running border with real-time counter
- Success State: Green success border indicating completion
- Failure State: Red error border for failed operations
- Connection Types:
- Solid arrows for step-to-step flow
- Dashed arrows for AI agent connections
🔄 Processing Flow
The application follows a sophisticated 4-step process:
Step 1: Data Extraction 📥
Purpose: Extract and prepare source materials
- Input: JSON file containing array of objects
- Processing: Extract HTML, CSS, and screenshot data
- Output: Structured data ready for conversion
- Status: Foundation step - must complete successfully
Step 2: AI Conversion 🤖
Purpose: Transform HTML/CSS to raw JSON layout
- Input: HTML and CSS from Step 1
- AI Agent: OpenAI integration (dashed connection)
- Processing: Intelligent conversion using machine learning
- Output: Raw JSON layout structure
- Retry Logic: Supports up to 5 regeneration attempts
Step 3: Validation & Comparison 🔍
Purpose: Ensure conversion accuracy through visual comparison
- Input: Raw JSON from Step 2 + Screenshot from Step 1
- Processing:
- Render JSON layout as image
- Compare with original screenshot
- Calculate similarity score
- Decision Logic:
- ✅ Similar: Proceed to Step 4
- ❌ Different: Return to Step 2 (max 5 times)
- Smart Retry: Iterative improvement with feedback
Step 4: Finalization ✅
Purpose: Complete the conversion process
- Success Path: From Step 3 validation success
- Failure Path: When Step 3 reaches maximum retry limit
- Output: Final result status and processed data
🎨 Visual Flow Design
Connection Types
- Step Flow:
Step 1 ──→ Step 2 ──→ Step 3 ──→ Step 4
- AI Integration:
Step 2 ┈┈┈→ OpenAI Agent
- Retry Loop:
Step 3 ←──── Step 2
(conditional)
Visual States
State | Visual Indicator | Description |
---|---|---|
Processing | 🔄 Animated border + timer | Step currently executing |
Success | ✅ Green border | Step completed successfully |
Failure | ❌ Red border | Step failed to complete |
Pending | ⚪ Default border | Step waiting to execute |
🛠️ Technology Stack
Core Dependencies
- Build System: Vite + TypeScript
- DOM Processing: jsdom, parse5
- CSS Processing: css, csso
- Utilities: lodash
- Runtime: Node.js with TypeScript support
Development Tools
- Linting: ESLint with TypeScript support
- Type Checking: TypeScript 5.7+
- Package Manager: pnpm
- Build Tool: Rollup with TypeScript plugin
📦 API Reference
Core Functions
parseHtml(html: string, css: string)
Parses HTML and CSS into structured format for processing.
const result = parseHtml(htmlContent, cssContent);
getEntities({ html, raw }: GetEntitiesProps)
Fills element attributes using parsed HTML and raw JSON data.
const entities = getEntities({
html: htmlString,
raw: sectionRawData,
});
getStyles(input: GetStylesInput)
Extracts and processes CSS styles from input data.
const styles = getStyles(styleInput);
getStylesAEP(type: string, cssText: string)
Analyzes element properties from CSS text.
const analyzedStyles = getStylesAEP("div", cssText);
getEEP(data: RawDataEEPV2[])
Expands element properties from raw data.
const expandedProperties = getEEP(rawDataArray);
🚀 Getting Started
Prerequisites
- Node.js 18+
- pnpm (recommended) or npm
Installation
# Clone the repository
git clone <repository-url>
cd xobuilder-extractor
# Install dependencies
pnpm install
Development
# Start development server
pnpm dev
# Build for production
pnpm build
# Run production build
pnpm start
🔧 Configuration
Build Configuration
The project uses Vite for modern, fast builds with TypeScript support. Configuration can be found in:
vite.config.ts
- Build configurationtsconfig.json
- TypeScript settingseslint.config.js
- Code quality rules
Environment Setup
Ensure your environment supports:
- ES2020+ features
- TypeScript 5.7+
- Modern DOM APIs through jsdom
📈 Performance & Scalability
Optimization Features
- Tree Shaking: Eliminates unused code in production builds
- Code Splitting: Modular architecture supports lazy loading
- Type Safety: Full TypeScript coverage prevents runtime errors
- Efficient Parsing: Optimized HTML/CSS processing with specialized libraries
Scalability Considerations
- Clean Architecture: Easy to extend with new processing steps
- Modular Design: Independent use cases and entities
- Retry Logic: Robust error handling and recovery
- AI Integration: Pluggable AI providers through abstraction layer
🐛 Troubleshooting
Common Issues
AI Conversion Failures
- Cause: Network issues or API limitations
- Solution: Check API keys and network connectivity
- Mitigation: Automatic retry logic handles temporary failures
Visual Comparison Mismatches
- Cause: Rendering differences or screenshot quality
- Solution: Adjust similarity threshold in validation logic
- Monitoring: Track retry patterns for optimization
Memory Usage
- Cause: Large HTML/CSS files or many concurrent processes
- Solution: Implement chunking for large files
- Prevention: Monitor and limit concurrent processing
🤝 Contributing
Development Workflow
- Create feature branch from main
- Implement changes following Clean Architecture
- Add/update tests for new functionality
- Ensure TypeScript compliance
- Submit pull request with detailed description
Code Standards
- Follow existing architectural patterns
- Maintain type safety throughout
- Document complex business logic
- Use meaningful variable and function names
📄 License
This project is licensed under the terms specified in the package.json file.
Built with ❤️ by HaiUTC
For questions or support, please refer to the project documentation or create an issue in the repository.