Dependency-free Markdown-to-HTML rendering engine utilizing strict Regex tokenization.
Core Engineer
Parsing Markdown without external libraries presents a significant Ambiguity Problem. Symbols like `*` or `_` are context-dependent—they can denote a list item, italics, bold text, or literal characters depending on their position. A naive approach fails when formats nest or overlap (e.g., a link containing bold text). The core challenge was designing a processing pipeline that resolves these conflicts deterministically without building a heavy Abstract Syntax Tree (AST).
Sequential Tokenization Pipeline
def apply_inline_formats(line: str) -> str:
"""
Strict execution order ensures data integrity.
Links must be processed first to prevent bold/italic markers
inside URLs (e.g., underscores) from being corrupted.
"""
line = convert_link(line) # Priority 1: Protect URLs
line = convert_code(line) # Priority 2: Protect code blocks
line = convert_emphasis(line) # Priority 3: Formatting
return line
def convert_emphasis(line: str) -> str:
# Utilization of Lookbehind (?<!w) ensures we only match
# underscores that are strictly borders of words.
# Matches 'bold' in '__bold__' but ignores 'variable' in 'my_variable_name'
line = re.sub(r'(?<!w)__([^_]+)__(?!w)', r'<strong>1</strong>', line)
return lineI opted for a Regex-based Sequential Pipeline over a full AST parser. While an AST allows for infinite nesting support, a Regex pipeline provides O(n) performance for typical documents and requires zero external dependencies, making it ideal for lightweight embedded script environments. The "Known Limitation" of order dependency was mitigated by enforcing a strict function call hierarchy (`apply_inline_formats`).
Engineered a comprehensive `pytest` suite covering 8 distinct edge case categories based on TDD principles.
Validated Intra-word Protection (`test_convert_emphasis_invalid_underscore`) to ensure variables like `my_variable_name` remain unformatted. Ensured Graceful Degradation for unclosed tags (`test_convert_code_unclosed`, `test_convert_emphasis_unclosed`), guaranteeing that malformed input renders as raw text rather than crashing the pipeline. Additionally, implemented strict HTML escaping tests (`test_convert_paragraph_special_chars`) to automatically sanitize special characters like `<` and `&`, preventing XSS vulnerabilities.
Demonstrated mastery of Core Computer Science fundamentals (String Manipulation & Regex) without relying on "Magic Libraries." This lightweight engine was designed to be drop-in compatible for environments where installing `pip` packages like `markdown` or `pandoc` is restricted.