
Understanding the Problem: Why Does Excel Break My Text?
What Is Unicode in Excel?
By default, modern versions of Excel store all text data in Unicode (UTF-16) encoding. This means every character — whether it is an English letter, a Telugu vowel, an Urdu consonant, or an emoji — is stored using the universal Unicode Standard, which supports over 154,000 characters across 168 scripts.
What Is Non-Unicode in Excel?
Non-Unicode encoding uses older, language-specific code pages (like Windows-1252 for Western European languages or ISCII for Indian scripts) that can represent only 256 characters per code page. When Excel exports or imports data using non-Unicode encoding, any character that does not exist in the target code page is replaced with a question mark (?) or a placeholder box.
The Core Problem
When you:
- Save an Excel file as CSV, Excel may default to your system’s non-Unicode code page
- Import a CSV file into Excel, Excel may open it in non-Unicode mode
- Copy text between Excel and a legacy application, encoding mismatches cause garbled output
- Use SSIS or other ETL tools to import Excel data into SQL Server, data type mismatches (DT_WSTR vs DT_STR) throw blocking errors
Method 1: Save Excel as UTF-8 CSV (Recommended for Web Use)
This is the standard approach when you need to export Excel data containing Unicode characters for use on websites, APIs, or modern applications.
Step 1: Open Your Excel Workbook
Open the Excel file containing your Unicode text data.
Step 2: Go to File → Save As
Click the File tab, then select Save As from the left sidebar.
Step 3: Choose the Save Location
Select the folder where you want to save the file.
Step 4: Select CSV UTF-8 Format
In the “Save as type” dropdown menu, select CSV UTF-8 (Comma delimited) (*.csv).
⚠ Critical: Do NOT select the regular “CSV (Comma delimited) (*.csv)” option. The regular CSV format uses your system’s default non-Unicode code page and will corrupt any characters outside that code page.
Step 5: Save the File
Click Save. Excel may display a warning about features not being compatible with CSV format — click Yes to proceed.
Step 6: Verify the Output
Open the saved CSV file in Notepad++ (not regular Notepad) and check the encoding in the bottom-right status bar. It should say UTF-8. Scroll through the file to confirm all your Unicode characters are intact.
Method 2: Export Excel Data as Non-Unicode CSV
If you specifically need non-Unicode output (for legacy systems, older DTP software, or code page-specific applications), follow these steps.
Step 1: Open Your Excel File
Step 2: Prepare Your Data — Make sure your Unicode text is correctly displayed in Excel. If you see question marks already, the data may have been imported incorrectly in the first place.
Step 3: Go to File → Save As
Step 4: Select CSV (Comma Delimited) Format — In the “Save as type” dropdown, select CSV (Comma delimited) (*.csv) — NOT the UTF-8 option.
Step 5: Save and Accept Warnings — Click Save. Excel will warn you that some features may be lost. Accept the warnings.
Step 6: Verify Code Page Encoding — Open the CSV file in Notepad++. Check the encoding — it should show ANSI (or your system’s default code page). Verify that characters within the code page range are correct, and note which characters were replaced with ? or boxes.
⚠ Warning About Data Loss — Non-Unicode CSV export is lossy. Any character that does not exist in the target code page will be permanently replaced. Always keep a backup of your original Unicode Excel file.
Method 3: Import Non-Unicode CSV into Excel Correctly
One of the most common causes of garbled text in Excel is importing a non-Unicode CSV file incorrectly. Here is the right way to do it.
Step 1: Open Excel (Do Not Double-Click the CSV) — Open Excel first. Do not double-click the CSV file to open it directly — Excel will auto-detect the encoding and often gets it wrong.
Step 2: Go to Data → Get Data → From Text/CSV — On the Data tab, click Get Data (or From Text in older Excel versions).
Step 3: Select Your CSV File — Browse to and select the CSV file you want to import.
Step 4: Set the Correct File Origin (Code Page) — In the Text Import Wizard:
- For Step 1, select Delimited
- In the File Origin dropdown, select the code page matching your file:
- 65001: Unicode (UTF-8) — for UTF-8 encoded files
- 1252: Western European (Windows) — for ANSI/Western European files
- 57002/57006 — for ISCII-based Indian language files
- Other code pages — select based on the source language
Step 5: Choose Delimiters — In Step 2, select the delimiter used in your CSV file (comma, semicolon, tab, etc.).
Step 6: Set Column Data Types — In Step 3, you can specify the data type for each column. For text columns containing Unicode characters, ensure they are set to Text (not General).
Step 7: Click Finish — Excel will import the data with correct encoding. Verify that all characters display properly.
Method 4: Using Notepad++ to Fix Encoding Before Import
Sometimes the fastest way to fix encoding issues is to pre-process your CSV file in Notepad++ before opening it in Excel.
Step 1: Open the CSV File in Notepad++ — Right-click your CSV file → Edit with Notepad++.
Step 2: Check Current Encoding — Look at the bottom-right corner of Notepad++. It will show the current encoding (UTF-8, ANSI, etc.).
Step 3: Convert to UTF-8 — Go to Encoding → Convert to UTF-8 (or UTF-8 without BOM).
Step 4: Save the File — Press Ctrl + S to save.
Step 5: Open in Excel — Now open the file in Excel using the Data → From Text/CSV method (Method 3 above). Select 65001: Unicode (UTF-8) as the File Origin.
Method 5: Excel VBA Macro for Unicode Conversion
For advanced users who need to automate Unicode/non-Unicode conversion across multiple files, a VBA macro provides the most control.
Create the Macro
Step 1: Open Excel and press Alt + F11 to open the VBA editor.
Step 2: Insert a new module (Insert → Module).
Step 3: Paste the following code:
vbaSub ConvertUnicodeToNonUnicode()
Dim ws As Worksheet
Dim cell As Range
Dim text As String
Dim result As String
Dim i As Long
Set ws = ActiveSheet
For Each cell In ws.UsedRange
text = cell.Value
result = ""
For i = 1 To Len(text)
Dim charCode As Long
charCode = AscW(Mid(text, i, 1))
If charCode < 256 Then
result = result & Chr(charCode)
Else
result = result & "?" ' Replace non-mappable characters
End If
Next i
cell.Value = result
Next cell
MsgBox "Conversion complete!", vbInformation
End Sub
Step 4: Run the macro (F5) on the active worksheet.
⚠ Important Notes About This Macro:
- This macro replaces any Unicode character with a code point above 255 with a question mark
- It operates on the active worksheet only — modify the range as needed
- Always run on a copy of your data first to verify results
- The macro does not preserve formatting (bold, colors, etc.)
Method 6: Excel Power Query for Encoding Conversion
Excel’s Power Query feature provides a robust, repeatable way to handle encoding conversion during data import.
Step 1: Open Power Query Editor — Go to Data → Get Data → From File → From Workbook and select your Excel file.
Step 2: Set Encoding in the Source Step — In Power Query Editor, go to the Applied Steps pane on the right. Click the gear icon next to the Source step.
Step 3: Select the Correct Encoding — Choose the appropriate encoding from the dropdown:
- 65001 (UTF-8) for Unicode files
- 1252 (Windows Western European) for non-Unicode Western files
- Custom code page for other languages
Step 4: Transform and Load — Apply any additional transformations, then click Close & Load to import the data into Excel with correct encoding.
Method 7: Fix SSIS Excel Import Errors
When importing Excel data into SQL Server via SSIS, the “Cannot convert between Unicode and non-Unicode string data types” error is extremely common. Here is how to fix it within your Excel-to-SQL workflow.
The Problem
Excel Source outputs data as DT_WSTR (Unicode string). Your SQL Server destination table uses VARCHAR (non-Unicode). SSIS refuses to auto-convert between these types.
Solution: Add Data Conversion Transformation
Step 1: In your SSIS Data Flow, drag a Data Conversion Transformation between the Excel Source and OLE DB Destination.
Step 2: Open the Data Conversion editor.
Step 3: Select each Unicode column and set the output type to DT_STR with the appropriate code page (1252 for English).
Step 4: Map the converted output columns (named “Copy of X”) to your destination table columns.
Step 5: Execute the package. The error should be resolved.
Alternative Solution: Change Excel Source Output
Right-click the Excel Source → Show Advanced Editor → Input and Output Properties tab → select the column under External Columns → change the DataType from DT_WSTR to DT_STR.
Note: This approach is less recommended because Excel data is inherently Unicode. Forcing non-Unicode at the source level can cause silent data corruption.
Troubleshooting Common Excel Unicode Issues
Issue 1: Characters Show as Question Marks (?)
Cause: The file was saved or imported using a non-Unicode code page that does not support the characters.
Fix: Re-import the file using File Origin → 65001: Unicode (UTF-8) in the Text Import Wizard.
Issue 2: Characters Show as Boxes or Squares
Cause: Excel does not have a font installed that can render the Unicode characters.
Fix: Install a Unicode font that supports your script (e.g., Noto Sans for most languages, Arial Unicode MS for broad coverage, or Gautami for Telugu). Then select that font for the affected cells.
Issue 3: Text Reverses After Conversion
Cause: Right-to-left scripts (Urdu, Arabic, Hebrew) may reverse when improperly handled by non-Unicode encoding.
Fix: Always use UTF-8 encoding for RTL scripts. If you must use non-Unicode, ensure the correct bidirectional code page is selected.
Issue 4: Formula Results Show Incorrect Characters
Cause: The UNICHAR() function returns a Unicode character, but the cell’s formatting or the workbook’s encoding cannot display it.
Fix: Check that your cell is formatted as Text (not General) and that a Unicode-compatible font is selected. Use =UNICODE(A1) to verify the character code.
Frequently Asked Questions
What is the difference between CSV UTF-8 and regular CSV in Excel?
CSV UTF-8 encodes text using the UTF-8 standard, which supports every character in every language. Regular CSV uses your system’s default code page (usually Windows-1252 for English systems), which supports only 256 characters. If your data contains non-Latin characters (Telugu, Urdu, Chinese, etc.), regular CSV will corrupt them.
Can Excel handle Urdu and Hindi text properly?
Yes, modern Excel (2016 and later) fully supports Unicode, including Urdu and Hindi text. The key is to ensure your workbook encoding is set correctly and that you have appropriate fonts installed. For Urdu, install Noto Nastaliq Urdu or Jameel Noori Nastaleeq. For Hindi, install Mangal or Noto Sans Devanagari.
Why does SSIS throw Unicode conversion errors when importing Excel files?
Excel stores all text as Unicode by default. When your SSIS destination table uses non-Unicode columns (VARCHAR, CHAR), the data types do not match. You must insert a Data Conversion Transformation in your SSIS pipeline to convert from DT_WSTR (Unicode) to DT_STR (non-Unicode).
How do I check what encoding my Excel file is using?
Open the file in Notepad++ and check the Encoding menu. If you created the file in Excel by using “Save As → CSV UTF-8,” it is UTF-8 encoded. If you used “Save As → CSV (Comma delimited),” it uses your system’s default ANSI code page.
Conclusion
Unicode to non-Unicode conversion in Excel is a common challenge, but it is entirely manageable with the right approach. The golden rules are simple: always use UTF-8 encoding when possible, verify your output before sharing, and never overwrite your original Unicode data. For additional conversion support — especially for Indian language text moving between Unicode and legacy non-Unicode font systems — visit unicode-to-nonunicode.com to use our free online converter tools.