sdlparser-scrap/docs/ARCHITECTURE.md

431 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ## Documentation
- **[README](../README.md)** - Project overview and quick start
- **[Getting Started](GETTING_STARTED.md)** - Installation and first steps
- **[Architecture](ARCHITECTURE.md)** - How the parser works
- **[Dependency Resolution](DEPENDENCY_RESOLUTION.md)** - Automatic type extraction
- **[API Reference](API_REFERENCE.md)** - Command-line options and features
- **[Known Issues](KNOWN_ISSUES.md)** - Limitations and workarounds
- **[Quickstart Guide](QUICKSTART.md)** - Quick reference
- **[Roadmap](ROADMAP.md)** - Future plans and priorities
## Technical Deep Dives
For implementation details and visual guides:
- **[Dependency Flow](DEPENDENCY_FLOW.md)** - Complete technical walkthrough
- **[Visual Flow Diagrams](VISUAL_FLOW.md)** - Quick reference diagrams
- **[Multi-Field Structs](MULTI_FIELD_IMPLEMENTATION.md)** - Struct parsing details
- **[Typedef Support](TYPEDEF_IMPLEMENTATION.md)** - Typedef implementation
- **[Multi-Header Testing](MULTI_HEADER_TEST_RESULTS.md)** - Test results
## Development
- **[Development Guide](DEVELOPMENT.md)** - Contributing and extending the parser
## Archive
Historical planning documents are in `archive/` for reference.
## High-Level Architecture
```
Input (C Header) → Scanner → Declarations → Dependency Resolver → CodeGen → Output (Zig)
```
## Core Components
### 1. Scanner (`src/patterns.zig`)
**Purpose**: Parse C header files into structured declarations
**Process**:
1. Reads header file line by line
2. Tries to match each line against known patterns
3. Extracts type information, comments, and structure
4. Returns array of `Declaration` structures
**Supported Patterns**:
- Opaque types: `typedef struct SDL_X SDL_X;`
- Typedefs: `typedef Uint32 SDL_PropertiesID;`
- Enums: `typedef enum { ... } SDL_Type;`
- Structs: `typedef struct { int x, y; } SDL_Rect;`
- Flags: `typedef Uint32 SDL_Flags;` + `#define` values
- Functions: `extern SDL_DECLSPEC void SDLCALL SDL_Func(...);`
### 2. Dependency Resolver (`src/dependency_resolver.zig`)
**Purpose**: Automatically find and extract missing type definitions
**Process**:
1. Scans all declarations to find referenced types
2. Compares referenced types against defined types
3. Identifies missing types
4. Parses `#include` directives from source
5. Searches included headers for missing types
6. Extracts and clones matching declarations
**Key Features**:
- Type string normalization (strips `*`, `const`, etc.)
- Deduplication using HashMaps
- Deep cloning for safe ownership
- Selective extraction (only types needed)
### 3. Code Generator (`src/codegen.zig`)
**Purpose**: Convert C declarations to idiomatic Zig code
**Process**:
1. Groups functions by first parameter type (method categorization)
2. Generates type declarations
3. Generates function wrappers
4. Applies naming conventions
5. Performs type conversion
**Features**:
- Method organization for opaque types
- Inline function wrappers
- Automatic type conversion
- Doc comment preservation
### 4. Type Converter (`src/types.zig`)
**Purpose**: Convert C types to Zig equivalents
**Conversions**:
```zig
"bool" "bool"
"Uint32" "u32"
"int" "c_int"
"SDL_Type *" "?*Type"
"const SDL_Type *" "*const Type"
```
### 5. Naming Convention Handler (`src/naming.zig`)
**Purpose**: Convert C names to idiomatic Zig
**Rules**:
- Strip `SDL_` prefix: `SDL_GPUDevice``GPUDevice`
- Remove first underscore: `SDL_GPU_TYPE``GPUType`
- CamelCase functions: `SDL_CreateDevice``createDevice`
- Lowercase first letter for values
## Data Flow
### 1. Parsing Phase
```
C Header File
Scanner.scan()
[]Declaration {
.opaque_type,
.typedef_decl,
.enum_decl,
.struct_decl,
.flag_decl,
.function_decl,
}
```
### 2. Dependency Analysis Phase
```
[]Declaration
DependencyResolver.analyze()
├─ collectDefinedTypes() → defined_types HashMap
└─ collectReferencedTypes() → referenced_types HashMap
getMissingTypes()
missing_types = referenced - defined
```
### 3. Dependency Resolution Phase
```
For each missing_type:
Parse #include directives
For each included header:
Read header file
Scanner.scan()
Search for matching type
If found: cloneDeclaration()
```
### 4. Code Generation Phase
```
[]Declaration (primary + dependencies)
CodeGen.generate()
├─ categorizeDeclarations() (group methods)
├─ writeHeader()
└─ writeDeclarations()
├─ writeOpaqueWithMethods()
├─ writeTypedef()
├─ writeEnum()
├─ writeStruct()
├─ writeFlags()
└─ writeFunction()
Zig source code (string)
```
### 5. Validation Phase
```
Generated Zig code
std.zig.Ast.parse()
Check for syntax errors
ast.renderAlloc() (format)
Write to file or stdout
```
## Key Algorithms
### Type Extraction
**Purpose**: Strip pointer/const decorators to get base type
```zig
"SDL_Window *" "SDL_Window"
"?*const SDL_Rect" "SDL_Rect"
"SDL_Buffer *const *" "SDL_Buffer"
```
**Algorithm**:
1. Trim whitespace
2. Remove leading qualifiers (`const`, `*`, `?`)
3. Remove trailing qualifiers (`*`, `*const`, ` const`)
4. Handle special patterns (`[*c]`)
5. Return base type string
### Multi-Field Parsing
**Purpose**: Handle C compact syntax like `int x, y;`
**Algorithm**:
1. Detect comma in field declaration
2. Extract common type (before first field name)
3. Split remaining part on commas
4. Create separate `FieldDecl` for each name
5. Return array of fields
**Example**:
```c
int x, y; [FieldDecl{.name="x", .type="int"},
FieldDecl{.name="y", .type="int"}]
```
### Method Categorization
**Purpose**: Determine if function should be a method
**Algorithm**:
1. Check if function has parameters
2. Get type of first parameter
3. Check if type is an opaque type pointer
4. If yes, add to opaque type's methods
5. If no, write as standalone function
**Example**:
```c
void SDL_Destroy(SDL_Device *d) Method of GPUDevice
void SDL_Init(void) Standalone function
```
## Memory Management
### Ownership Rules
1. **Scanner owns strings** during parsing (allocated from its allocator)
2. **Parser owns declarations** after scanning (freed at end of main)
3. **Resolver owns HashMap keys** (duped when inserted, freed in deinit)
4. **Cloned declarations own strings** (allocated explicitly, freed by caller)
### Allocation Strategy
```
GPA (General Purpose Allocator)
├─ Primary header source (freed at end)
├─ Primary declarations (freed with deep free)
├─ DependencyResolver
│ ├─ referenced_types HashMap (keys owned)
│ └─ defined_types HashMap (keys borrowed)
├─ Missing types array (freed explicitly)
├─ Includes array (freed explicitly)
├─ Dependency declarations (freed with deep free)
└─ Generated output (freed after writing)
```
### Cleanup Pattern
```zig
defer {
for (decls) |decl| {
freeDeclDeep(allocator, decl);
}
allocator.free(decls);
}
```
## Error Handling
### Fatal Errors (Exit Immediately)
- File not found (primary header)
- Out of memory
- Cannot write output file
### Non-Fatal Errors (Continue with Warnings)
- Dependency header not readable → Skip, try next
- Type not found in any header → Print warning, continue
- Struct parsing error → Generate partial, continue
- Syntax errors in output → Print errors, write anyway
### Error Recovery
The parser uses graceful degradation:
1. Try to extract as much as possible
2. Warn about issues
3. Continue processing
4. Generate best-effort output
This allows partial success even with problematic headers.
## Extension Points
### Adding New Pattern Support
1. Add new variant to `Declaration` union in `patterns.zig`
2. Implement `scan*()` function to match pattern
3. Add to pattern matching chain in `Scanner.scan()`
4. Update all switch statements:
- Cleanup code in `parser.zig`
- `cloneDeclaration()` in `dependency_resolver.zig`
- `freeDeclaration()` in `dependency_resolver.zig`
5. Implement `write*()` in `codegen.zig`
### Adding Type Conversions
Edit `src/types.zig`:
```zig
pub fn convertType(c_type: []const u8, allocator: Allocator) ![]const u8 {
// Add new conversion here
if (std.mem.eql(u8, c_type, "MyType")) {
return try allocator.dupe(u8, "MyZigType");
}
// ...
}
```
### Adding Naming Rules
Edit `src/naming.zig`:
```zig
pub fn typeNameToZig(c_name: []const u8) []const u8 {
// Add custom naming logic
}
```
## Performance Characteristics
### Time Complexity
- **Primary parsing**: O(n) where n = source lines
- **Dependency analysis**: O(d) where d = declarations
- **Type extraction**: O(h × d) where h = headers, d = declarations per header
- **Code generation**: O(d) where d = total declarations
**Overall**: O(n + h×d) - Linear for typical use
### Space Complexity
- **Declarations**: O(d) where d = declaration count
- **HashMaps**: O(t) where t = unique type names
- **Output**: O(d) where d = declaration count
**Peak memory**: ~2-5MB for SDL_gpu.h (169 declarations)
### Optimization Points
Current optimizations:
- HashMap-based deduplication
- Early exit when type found
- Selective parsing (only missing types)
- String interning for type names
Potential improvements:
- Cache parsed headers (avoid re-parsing)
- Parallel header processing
- Lazy header loading
## Testing Strategy
### Unit Tests (`test/`)
- Pattern matching tests (each C pattern)
- Type conversion tests
- Naming convention tests
- Dependency resolution tests
- Multi-field parsing tests
### Integration Tests
- Real SDL headers (SDL_gpu.h)
- Dependency chain resolution
- End-to-end parsing and generation
### Validation
- AST parsing of generated code
- Memory leak detection (GPA)
- No regressions (all tests must pass)
## Code Organization
```
src/
├── parser.zig # Main entry point, CLI handling
├── patterns.zig # Pattern matching and scanning
├── types.zig # C to Zig type conversion
├── naming.zig # Naming convention handling
├── codegen.zig # Zig code generation
├── mock_codegen.zig # C mock generation
└── dependency_resolver.zig # Dependency analysis and extraction
test/
└── (various test files)
docs/
├── GETTING_STARTED.md # This file
├── ARCHITECTURE.md # Architecture overview
├── DEPENDENCY_RESOLUTION.md # Dependency system details
└── ...
```
## Next Steps
- Read [Dependency Resolution](DEPENDENCY_RESOLUTION.md) for details on automatic type extraction
- See [API Reference](API_REFERENCE.md) for all command-line options
- Check [Known Issues](KNOWN_ISSUES.md) for current limitations
- Review [Development](DEVELOPMENT.md) to contribute
---
**Related Documents**:
- Technical deep dive: [docs/DEPENDENCY_FLOW.md](DEPENDENCY_FLOW.md)
- Visual diagrams: [docs/VISUAL_FLOW.md](VISUAL_FLOW.md)