Overview
The String Inspector analyzes text input and provides detailed statistics about character counts, byte lengths, encoding detection, and character frequency. Essential for understanding text properties, encoding issues, and text processing requirements.Use Cases
- Character Counting: Get accurate grapheme counts (not just code points)
- Encoding Analysis: Detect ASCII vs UTF-8 and calculate byte sizes
- Database Planning: Determine storage requirements for text columns
- Text Processing: Understand character distribution for algorithms
- Content Validation: Verify text properties meet requirements
- Debugging: Investigate encoding issues and hidden characters
Input Format
Paste any text to analyze:Output Format
Provides comprehensive text statistics:Metrics Explained
Grapheme Count
The number of user-perceived characters, properly handling:- Emoji (including multi-codepoint emoji like 👨👩👧👦)
- Combining diacritics (é counted as one, not e + ́)
- Regional indicators (flag emoji)
Byte Lengths
- UTF-8: Variable-width encoding (1-4 bytes per character)
- UTF-16: Fixed 2 bytes for BMP, 4 bytes for supplementary characters
Encoding Detection
- ASCII: Only characters 0x00-0x7F
- UTF-8: Any characters outside ASCII range
Character Frequency
Top 12 most frequent characters with their counts, useful for:- Text analysis and pattern detection
- Compression estimation
- Identifying unusual characters
Examples
Implementation Details
Fromlib/tools/engine.ts:566-592:
Grapheme counting uses
Intl.Segmenter, providing accurate counts for complex Unicode sequences including emoji with ZWJ (Zero-Width Joiner) sequences.