
You paste Telugu text from a website into PageMaker. Instead of అమ్మ, you see F087 F080 F05C. That is encoding failure. This guide explains why it happens, how to fix it, and gives you the exact Anu↔Unicode mappings no other website provides.
🔍 Quick Fix (30 seconds): If your Telugu text shows as F087 F080 F05C or boxes (□□□), you have an Anu to Unicode mismatch. Scroll down to the Telugu Script: Anu to Unicode Mapping section for the exact character table, or use the Unicode to Non Unicode converter above to fix it instantly.
What Happens When Encoding Fails? (Mojibake Explained)
Have you ever opened a file and seen something like தமிழ௠instead of Tamil text? Or maybe F087 F080 F05C instead of Telugu? That garbage text has a name: mojibake.
Mojibake (文字化け) is a Japanese word that means “character transformation.” It happens when your computer reads text using the wrong encoding system.
Real-world scenarios where encoding fails:
- You copy Telugu text from a website and paste it into PageMaker or CorelDraw. The text turns into random symbols.
- You export data from a database, and all Telugu characters show up as ????.
- Someone emails you a document, but the attachments open as gibberish.
- Windows displays boxes (□□□) instead of Telugu fonts in old software. (Learn how to fix this in our guide on Language for Non-Unicode Programs in Windows.)
Visual example:
| What you should see | What you actually see (mojibake) |
| తెలుగు (Telugu) | F0C1 F0E8 F0C8 F0D2 F0A0 (Anu encoding) |
| నమస్కారం (Namaskaram) | F09C F0B0 F0C3 F0C2 F0BE F0B0 F0D2 |
💡 Tip 1: If you see question marks (???), that means the character exists in Unicode but your font does not have a glyph for it. If you see boxes (□□□), the character is missing entirely from your font. If you see random symbols, the encoding is completely wrong.
What is a Byte Order Mark (BOM)?
A BOM is a special byte sequence at the very beginning of a file that tells the computer which encoding was used. For UTF-8, the BOM is EF BB BF (three bytes). For UTF-16 little-endian, the BOM is FF FE. For UTF-16 big-endian, the BOM is FE FF.
Some old software adds a BOM automatically. Other software crashes when it sees one. This is why you sometimes open a file and see  at the beginning of the text – that is the UTF-8 BOM being displayed as actual characters instead of being interpreted as encoding information.
Fact 1 – Source: Unicode Consortium
“The Unicode Standard covers 154,998 characters from 168 scripts, making it the only character encoding system capable of representing virtually every written language in a single, unified framework.”
Source: Unicode Consortium Unicode 15.0 Standard
Fact 2 – Source: Unicode Consortium
*”The Byte Order Mark (BOM) is a Unicode character (U+FEFF) used to indicate the byte order and encoding form of a text file. For UTF-8, the BOM is EF BB BF in hexadecimal.”*
Source: Unicode Consortium BOM FAQ
What Is Unicode and Non-Unicode?
Before you can fix encoding problems, you need to understand what Unicode and non-Unicode actually mean. For a complete plain-language introduction, read our guide: What is Unicode and Non-Unicode? A Complete Plain Language Guide.
Unicode – The Universal Character Encoding Standard
Think of Unicode as a giant phone book for every character on Earth. Every letter, number, symbol, and emoji from every language has its own unique number. That number is called a code point. To understand exactly how Unicode works in computers, including binary and byte-level storage, see our detailed technical guide.
For example:
- The English letter ‘A’ has code point U+0041
- The Telugu letter ‘క’ (ka) has code point U+0C15
- The Hindi letter ‘क’ (ka) has code point U+0915
Notice that Telugu క and Hindi क look similar but have different code points. That is because they come from different scripts.
Unicode is maintained by the Unicode Consortium, a non-profit organisation based in Mountain View, California.
The three most common ways to store Unicode text are:
- UTF-8 – Uses 1 to 4 bytes per character. This is what the web uses.
- UTF-16 – Uses 2 or 4 bytes per character. Windows and SQL Server use this. Can be little-endian (Windows) or big-endian (Unix).
- UTF-32 – Uses 4 bytes for every character. Simple but wastes space.
What is Endianness?
Endianness refers to the order in which bytes are stored in computer memory. Little-endian stores the smallest byte first. Big-endian stores the largest byte first. UTF-16 and UTF-32 have two versions: UTF-16LE (little-endian, used by Windows) and UTF-16BE (big-endian, used by some Unix systems). If you open a UTF-16BE file on a system expecting UTF-16LE, you will see garbled text.
Fact 3 – Source: W3Techs
*”UTF-8 encoding is used by over 98% of all websites on the internet as of 2025.”*
Source: W3Techs UTF-8 Usage Statistics
Non-Unicode – Legacy and Code Page-Based Systems
Before Unicode was invented in the 1990s, every software company made its own encoding system. These older systems are called non-Unicode or legacy encoding.
Instead of one giant universal list, non-Unicode systems use a code page. A code page is a small lookup table that maps byte values to characters for one specific language.
Here is the problem: a single-byte code page can only hold 256 characters. That is fine for English (which has 26 letters), but Telugu has over 200 characters. Telugu needs its own special code page.
Examples of non-Unicode systems:
- ASCII – The original standard. Only 128 characters. English only.
- Windows-1252 – Default Windows code page for Western languages.
- ISCII – Indian Script Code for Information Interchange. A government standard before Unicode.
- Anu Script – Custom encoding for Telugu fonts. Very common in Andhra Pradesh and Telangana. Use our dedicated Unicode to Anu Converter for accurate Telugu conversion.
- Nudi – Legacy encoding for Kannada (and sometimes Telugu).
- Kruti Dev – Legacy encoding for Hindi.
💡 Tip 2: Non-Unicode is not one system. It is hundreds of different systems that do not talk to each other. Anu Telugu text opened on a Windows-1252 system shows completely different, meaningless characters.
Fact 4 – Source: Microsoft Documentation
*”A single-byte non-Unicode code page can represent only 256 characters, while Unicode supports 1,114,112 possible code points – a capacity difference of more than 4,000 times.”*
Source: Microsoft Code Pages Documentation
Difference Between Unicode and Non-Unicode
Here is a simple comparison table to help you see the differences clearly. For a deeper side-by-side breakdown, visit our guide: Unicode vs Non-Unicode Key Differences Explained Simply.
| Feature | Unicode | Non-Unicode (like Anu Script) |
| Characters supported | 154,998+ across all scripts | Only 256 per code page |
| Can you mix languages? | Yes – Telugu, Hindi, English in same file | No – one code page = one language family |
| Storage for Telugu character | 3 bytes (UTF-8) | 1 byte |
| Works on web browsers? | Yes – perfectly | No – breaks immediately |
| Works in PageMaker/CorelDraw? | No – these old tools don’t support Unicode | Yes – if you have the right font |
| Example fonts | Gautami, Noto Sans Telugu, Mangal | Anu Script, Eemaata, Nudi |
| SQL Server data types | nvarchar, nchar | varchar, char |
The key takeaway: Use Unicode for everything new – websites, mobile apps, modern databases, emails. Only use non-Unicode when you absolutely have to edit old files created in PageMaker, CorelDraw, or government systems that still require Anu fonts.
How to Convert Unicode to Non-Unicode Text (Step-by-Step)
Converting your text takes less than sixty seconds. Here is exactly how to do it using the converter on this website.
Step 1: Open the Unicode to Non Unicode converter tool at the top of this page.
Step 2: Paste your Unicode text into the input field. This is the normal text you see on any modern phone, computer, or website.
Step 3: Select the target non-Unicode font from the dropdown menu. For Telugu, choose Anu Script. For Kannada, choose Nudi. For Hindi, choose Kruti Dev.
Step 4: Click the “Convert” button.
Step 5: Check the converted output in the result box. Make sure all characters, conjuncts, and ligatures look correct.
Step 6: Click the copy button to copy the converted result.
Step 7: Paste it into your target application – PageMaker, CorelDraw, QuarkXPress, PixelLab, Microsoft Word with legacy fonts, or any other program that needs non-Unicode text.
⚠️ Important warning: Unicode to non-Unicode conversion can cause character loss or truncation if the target encoding does not support every character in your source text. Diacritics, special punctuation marks, and characters outside the target code page’s range are the first to be affected. Always verify your output before using it in any production document.
💡 Tip 3: Always test your converted output with a single paragraph before converting an entire book. Different versions of Anu (7.0 versus 7.1) have slight mapping variations. What works in Anu 7.0 may look wrong in Anu 7.1.
How UTF-8 Encodes Characters (Byte-Level Deep Dive)
This section is for those who want to understand what is actually happening inside the computer. Even if you are not a programmer, reading this will help you understand why encoding errors happen. For a complete comparison of what encoding systems are in Unicode (UTF-8 vs UTF-16), see our dedicated guide.
UTF-8 Byte Prefix Rules
UTF-8 is clever. It looks at the first few bits of a byte and knows immediately how many bytes to read.
| Byte Prefix (Binary) | What it means | Example |
| 0xxxxxxx | One byte. This is plain English ASCII. | Letter ‘A’ = 0x41 |
| 110xxxxx | Start of a 2-byte character. | Latin ‘é’ = 0xC3 0xA9 |
| 1110xxxx | Start of a 3-byte character. | Most Telugu characters |
| 11110xxx | Start of a 4-byte character. | Emojis and rare characters |
| 10xxxxxx | Continuation byte. This follows a start byte. | Part of a multi-byte character |
Telugu Character in UTF-8 Bytes (Practical Example)
Let us take the Telugu letter క (ka). This is one of the most common consonants in Telugu.
- Unicode code point: U+0C15
- Decimal number: 3093
- Binary: 0000 1100 0001 0101
- UTF-8 encoding (3 bytes): 0xE0 0xB0 0x95
- Hexadecimal you might see in a file: E0 B0 95
Now compare with Anu encoding:
Anu does not use the standard Unicode system. Instead, it places Telugu characters in the Private Use Area (PUA). The PUA is a special range of code points that Unicode set aside for people to use however they want.
The same Telugu క in Anu might be mapped to U+F0A3. In bytes, that is 0xEF 0x82 0xA3.
When you open an Anu-encoded file on a modern computer that expects UTF-8, the computer reads 0xEF 0x82 0xA3 and tries to interpret it as a UTF-8 character. It will show some random symbol – not Telugu క at all.
💡 Tip 4: The Private Use Area (U+E000 to U+F8FF) contains 6,400 code points reserved for custom character assignments. Anu Script occupies approximately 200 of these for Telugu glyphs. No other font or system understands these code points unless it was specifically designed for Anu.
Fact 5 – Source: Unicode Consortium
*”The Private Use Area (U+E000 to U+F8FF) contains 6,400 code points that are reserved for custom character assignments and have no standardised meaning across different systems.”*
Source: Unicode Consortium Private Use Area Chart
Telugu Script: Anu to Unicode Mapping Table
This is the most valuable section on this page. No other website provides this complete mapping table for Telugu Anu to Unicode conversion.
Use this table if you need to manually decode an Anu-encoded file, or if you want to understand exactly how the converter works.
| Anu PUA Code Point | Unicode Code Point | Telugu Character | Character Name |
| U+F087 | U+0C08 | ఈ | I vowel (long) |
| U+F080 | U+0C06 | ఆ | Aa vowel |
| U+F05C | U+0C32 | ల | La consonant |
| U+F0E1 | U+0C2F | య | Ya consonant |
| U+F02B | U+0C66 | ౦ | Zero digit |
| U+F0A3 | U+0C15 | క | Ka consonant |
| U+F0B0 | U+0C30 | ర | Ra consonant |
| U+F0C1 | U+0C35 | వ | Va consonant |
| U+F0D2 | U+0C2E | మ | Ma consonant |
| U+F0E8 | U+0C38 | స | Sa consonant |
| U+F06D | U+0C24 | త | Ta consonant |
| U+F07E | U+0C2C | బ | Ba consonant |
| U+F091 | U+0C2D | భ | Bha consonant |
| U+F04B | U+0C2B | య | Ya (alternate) |
💡 Tip 5: If you have a PDF with Anu-encoded text that you cannot select or copy, use this mapping table to manually decode it. Each U+Fxxx code point corresponds directly to a Telugu Unicode character. This is slow but works perfectly.
Unicode vs Non-Unicode Data Types in SQL Server (SSIS Fix Included)
If you work with SQL Server or SSIS (SQL Server Integration Services), you have probably seen this error:
“Cannot convert between unicode and non-unicode string data types”
This error stops your entire data flow. Here is why it happens and how to fix it. For a complete developer-focused guide, read Unicode vs Non-Unicode in SQL Server for Developers.
Why the SSIS Error Happens
SSIS has two different string data types:
- DT_WSTR – Unicode string (same as nvarchar in SQL Server)
- DT_STR – Non-Unicode string (same as varchar in SQL Server)
When your source component (like a Flat File Source or Excel Source) outputs DT_WSTR (Unicode), but your destination (like OLE DB Destination) expects DT_STR (non-Unicode), SSIS refuses to convert automatically. It throws an error instead.
Fix 1 – Data Conversion Transformation (Recommended)
This is the simplest method. No SQL writing required.
- Open your SSIS package and go to the Data Flow tab.
- Drag a Data Conversion Transformation between your source and destination.
- Double-click the Data Conversion Transformation to open it.
- Select the columns that are causing the error (they will be DT_WSTR type).
- Change the output data type to DT_STR.
- Set the code page to 1252 for US English or 65001 for UTF-8.
- Click OK.
- In the destination’s Column Mapping, map the converted columns (they will have “Copy of” in the name).
Fix 2 – Derived Column Transformation
Add a Derived Column Transformation and use this expression:
(DT_STR, 100, 1252)ColumnName
Replace ColumnName with your actual column name. The number 100 is the length. Adjust it based on your data.
Fix 3 – SQL Command in OLE DB Source
Change the Data Access Mode to “SQL Command” and write:
SELECT CAST(ColumnName AS VARCHAR(100)) AS ColumnName FROM YourTable
Fact 6 – Source: Microsoft SQL Server Documentation
“Implicit conversion from varchar to nvarchar in SQL Server prevents index seeks, forcing table scans. On a table with 10 million rows, this can increase query time from 50 milliseconds to over 5 seconds.”
Source: Microsoft SQL Server Data Type Conversion
SQL Server Data Type Comparison Table
| Data Type | Unicode? | Storage per character | Max length | When to use |
| varchar(n) | No | 1 byte | 8,000 characters | English only, legacy systems |
| nvarchar(n) | Yes | 2 bytes | 4,000 characters | Multiple languages, modern apps |
| char(n) | No | 1 byte (fixed) | 8,000 characters | Fixed-length codes (ISO, country codes) |
| nchar(n) | Yes | 2 bytes (fixed) | 4,000 characters | Fixed-length Unicode (rare) |
💡 Tip 6: In SSIS, always place the Data Conversion Transformation as close to the source as possible. Converting early reduces string length mismatches downstream. If you convert too late, you might have already lost data.
How to Fix “Language for Non-Unicode Programs” in Windows
Have you ever opened an old Telugu software and seen boxes (□□□) instead of text? Or question marks (???) everywhere?
This happens because of a Windows setting called “Language for non-Unicode programs.” We have a complete step-by-step guide on fixing Language for Non-Unicode Programs in Windows.
What This Setting Does
When an old program (like PageMaker 7, CorelDraw 7, or AnuScript) does not support Unicode, Windows asks: “Which code page should I use to convert bytes into characters?”
If the code page does not match what the program expects, you see mojibake (garbled text) or tofu (□□□ boxes).
Step-by-Step Fix for Windows 10 and Windows 11
- Open Settings (press Windows key + I).
- Go to Time & Language.
- Click Language & Region.
- Scroll down and click Administrative language settings (under Related settings).
- In the Region dialog box, click the Administrative tab.
- Click Change system locale (under Language for non-Unicode programs).
- Select Telugu (India) or whatever language matches your non-Unicode software.
- Click OK.
- Restart your computer.
After restarting, open your old Telugu software again. The text should now appear correctly.
💡 Tip 7: This setting does NOT affect modern web browsers or Unicode apps. It only changes the default code page for older 32-bit applications that do not call the Windows ANSI code page API. Your web browser and Microsoft Word will continue working normally.
Fact 7 – Source: Microsoft Windows Documentation
“The system locale (Language for non-Unicode programs) determines which code page the system uses when converting non-Unicode text to Unicode for compatibility with older applications. Changing this setting requires a system restart.”
Source: Microsoft Windows System Locale Documentation
Unicode vs Non-Unicode in SAP Systems
For enterprise users working with SAP, this section is for you. For a detailed guide, see Unicode vs Non-Unicode in SAP Systems.
Every SAP installation is configured as either:
- Unicode system – Uses UTF-16 encoding. Supports all languages simultaneously.
- Non-Unicode system – Uses language-specific code pages. Limited to one language family.
Important: SAP S/4HANA requires a Unicode system. If your organisation is still running a non-Unicode SAP ERP system, you must complete a full Unicode conversion before migrating to S/4HANA.
What Happens During SAP Unicode Conversion
- The conversion typically increases database size by 30% to 70% because UTF-16 uses more bytes per character than single-byte code pages.
- All custom ABAP programs must be scanned for Unicode compliance using report UCCHECK.
- Cluster tables need cleanup using report SDBI_CLUSTER_CHECK.
- Match Code IDs must be deleted using report TWTOOL01.
- The export and import uses R3trans, SAP’s transport tool.
- All third-party products must be checked for Unicode compliance.
Fact 8 – Source: SAP Help Portal
*”SAP S/4HANA requires a Unicode system. Non-Unicode SAP ERP installations must complete a full Unicode conversion before migration. The conversion typically increases database size by 30–70% due to UTF-16 encoding.”*
Source: SAP Help Portal Unicode Conversion
Best Practices and Next Steps
Now that you understand how encoding works, here is what you should do going forward.
When to Use Unicode (Almost Always)
| Scenario | Use Unicode? | Why |
| Creating a new website | Yes | The web requires UTF-8. |
| Building a mobile app | Yes | iOS and Android fully support Unicode. |
| Creating a new SQL Server table | Yes | Use nvarchar for any column that might have non-English text. |
| Sending email to international recipients | Yes | Modern email supports UTF-8. |
| Saving a Word document | Yes | Modern Word uses Unicode by default. |
When You Might Need Non-Unicode
| Scenario | Use Non-Unicode? | Why |
| Editing an old PageMaker file | ⚠️ Yes (Anu) | Old PageMaker does not support Unicode. |
| Filling a government form that requires Anu 7.0 | ⚠️ Yes (Anu) | Some government systems still use legacy fonts. |
| Working with a printing press that only accepts non-Unicode fonts | ⚠️ Yes | Many small printing shops in India still run old software. |
Five Action Steps to Take Today
- Convert all your old Anu documents to Unicode. Do this once. Store the Unicode versions as your master copies. Keep the Anu versions only if you need to edit them in old software. Use the Unicode to Anu Converter for accurate Telugu conversion.
- Stop creating new documents in non-Unicode. Every new document you create should be in Unicode (UTF-8). Use Google Indic Keyboard or Microsoft InScript to type Telugu directly.
- Update your Windows system locale if you still use old Telugu software. Follow our guide on Language for Non-Unicode Programs in Windows.
- Fix your SSIS packages if you see the “cannot convert between unicode and non-unicode” error. Read Unicode vs Non-Unicode in SQL Server for Developers for detailed fixes.
- Check your SQL Server queries for implicit conversions. If you compare an nvarchar column with a varchar value, SQL Server will convert silently and may skip indexes. Use N’text’ prefix for Unicode literals.
Frequently Asked Questions (FAQ)
What is the difference between Unicode and non-Unicode characters?
Unicode characters follow a universal standard where every character from every language has its own unique code point. Non-Unicode characters use older, platform-specific or font-specific code pages that hold only about 256 characters each. Unicode supports all languages simultaneously. Non-Unicode supports just one language or region per code page. For a complete comparison, see Unicode vs Non-Unicode Key Differences Explained Simply.
Is nvarchar Unicode or non-Unicode?
nvarchar is a Unicode data type in SQL Server. The “n” stands for “National” – National Character Varying. It stores text using Unicode encoding, typically requiring 2 bytes per character. Its non-Unicode counterpart is varchar, which uses 1 byte per character and is limited to a single code page. Learn more in Unicode vs Non-Unicode in SQL Server for Developers.
Can we convert Unicode to non-Unicode without losing data?
It depends entirely on the characters involved. Conversion is lossless only when every character in your Unicode text has a matching equivalent in the target non-Unicode encoding. Characters that fall outside the target code page’s range will be lost, truncated, or replaced with question marks. Diacritics and special punctuation are the most vulnerable. Always verify your output after any conversion.
Why does SSIS show “cannot convert between unicode and non-unicode”?
This error occurs because the source component outputs strings as DT_WSTR (Unicode), while the destination expects DT_STR (non-Unicode). SSIS does not perform this conversion automatically. To fix it, add a Data Conversion Transformation between your source and destination to explicitly convert from DT_WSTR to DT_STR with the appropriate code page. See our SQL Server developer guide for step-by-step instructions.
What is the best Unicode to non-Unicode converter for Telugu?
For Telugu text conversion, the converter at the top of this page supports conversion to Anu Script (Anu 7.0), Eemaata format, and other legacy Telugu fonts. For Anu-specific conversion, use our dedicated Unicode to Anu Converter. The key is accurate character mapping for Telugu conjunct characters, half-forms, and special aksharas.
What is the Private Use Area (PUA) in Unicode?
The Private Use Area (PUA) is a range of code points from U+E000 to U+F8FF that Unicode reserved for custom character assignments. Legacy font systems like Anu Script place their characters in the PUA. This means those characters have no standard meaning outside that specific font. Learn more in how Unicode works in computers.
What is a Byte Order Mark (BOM)?
A Byte Order Mark (BOM) is a special character (U+FEFF) placed at the beginning of a text file to indicate which Unicode encoding and byte order was used. UTF-8 BOM is EF BB BF. UTF-16 little-endian BOM is FF FE. UTF-16 big-endian BOM is FE FF.
Can I use non-Unicode fonts on websites?
No. Non-Unicode fonts should not be used for web content. The web runs on Unicode (UTF-8), and browsers cannot correctly interpret non-Unicode character positions. If you embed a non-Unicode font on a website, visitors who do not have that exact font installed will see incorrect characters. Always use Unicode web-safe fonts for the web. Reserve non-Unicode fonts for offline print, DTP, and legacy applications.
How do I type Telugu Unicode directly without conversion?
Use Google Indic Keyboard (phonetic typing) or Microsoft InScript (physical keyboard layout). Both are free. You type as you speak, and the software converts it to Telugu Unicode automatically. No conversion needed afterwards.
What is mojibake?
Mojibake is a Japanese word (文字化け) that means “character transformation.” It describes the garbage text you see when a computer reads a file using the wrong encoding. For example, seeing F087 F080 F05C instead of అమ్మ.
Why does my Telugu text show as boxes (□□□) in old software?
Your Windows system locale is set to a language that does not match your Telugu non-Unicode software. Follow our guide on Language for Non-Unicode Programs in Windows to fix this.
What is endianness in Unicode encoding?
Endianness refers to the order in which bytes are stored in computer memory. Little-endian stores the smallest (least significant) byte first. Big-endian stores the largest (most significant) byte first. UTF-16 has two variants: UTF-16LE (little-endian, used by Windows) and UTF-16BE (big-endian, used by some Unix systems).