Unicode vs Non-Unicode
in SQL Server —
A Developer’s Complete Reference
Data type mismatches between Unicode and non-Unicode columns are behind some of the most common and costly SQL Server performance problems. This guide explains the difference, the traps, and the fixes.
- Why Encoding Matters Inside SQL Server
- Complete Unicode vs Non-Unicode Data Type Reference
- nvarchar vs varchar — The Practical Decision
- Implicit Conversion — The Hidden Performance Killer
- The N Prefix — Why One Character Changes Everything
- SSIS Unicode Conversion Error — Causes and Fixes
- SQL Server Unicode Best Practices
SQL Server stores text in two completely different ways depending on whether the column is defined as a Unicode or non-Unicode type. The gap between these two approaches is not just about character capacity — it affects storage size, query plan selection, index utilization, interface compatibility, and the behavior of your entire data integration pipeline.
Developers who do not understand this distinction often introduce performance bugs that only manifest at scale — slow queries that run fine on small tables but become painfully expensive as data grows. This guide makes the distinction concrete with real examples.
1. Why Encoding Matters Inside SQL Server
When you store the string “Hyderabad” in a SQL Server column, the database does not store the letters — it stores numbers. The encoding determines which numbers. A non-Unicode column stores one byte per character using the database’s configured code page. A Unicode column stores two bytes per character using UTF-16 encoding.
For English text, both systems produce readable results — but they are storing fundamentally different byte sequences. When you mix these two types in queries, joins, or comparisons, SQL Server has to resolve the mismatch — and that resolution has real performance costs.
2. Complete Unicode vs Non-Unicode Data Type Reference
| Data Type | Unicode? | Length | Storage | Max Size | Status |
|---|---|---|---|---|---|
char(n) | Non-Unicode | Fixed | 1 byte/char | 8,000 bytes | Active |
varchar(n) | Non-Unicode | Variable | 1 byte/char | 8,000 bytes | Active |
varchar(max) | Non-Unicode | Variable | 1 byte/char | 2 GB | Active |
text | Non-Unicode | Variable | 1 byte/char | 2 GB | ⚠ Deprecated |
nchar(n) | Unicode | Fixed | 2 bytes/char | 8,000 bytes | Active |
nvarchar(n) | Unicode | Variable | 2 bytes/char | 8,000 bytes | Active |
nvarchar(max) | Unicode | Variable | 2 bytes/char | 2 GB | Active |
ntext | Unicode | Variable | 2 bytes/char | 2 GB | ⚠ Deprecated |
The “n” prefix in nvarchar and nchar stands for National Character — terminology borrowed directly from the ISO/IEC SQL standard. It signals Unicode storage. The deprecated types (text and ntext) should not be used in new code — use varchar(max) and nvarchar(max) instead.
3. nvarchar vs varchar — The Practical Decision
The core question every database designer faces is: when should a column be nvarchar versus varchar? The answer depends on three factors: what characters the data actually contains, where the data comes from, and how much storage you can afford.
Use nvarchar (Unicode) when:
- The column will store text in any non-Latin script — Arabic, Chinese, Japanese, Hindi, Telugu, Bengali, etc.
- The data comes from user input via a web form, mobile app, or modern API (all Unicode sources)
- The column stores names, addresses, or descriptions that users might enter in any language
- The data will be exported to web services, JSON, or XML (all Unicode formats)
- You are building a multilingual application or SaaS product
Use varchar (Non-Unicode) when:
- The column stores strictly ASCII data — codes, identifiers, postal codes, URLs, email addresses
- Storage optimization is critical and you can guarantee the character set will never expand
- The column stores internal system values that users never input directly
- You are interfacing with a legacy system that only sends single-byte encoded data
4. Implicit Conversion — The Hidden Performance Killer
This is the most important section in this guide for SQL Server performance. Implicit conversion happens when SQL Server automatically converts one data type to another to resolve a comparison or join — and it happens silently, without warning, in your query execution plan.
Here is the scenario: you have a table with an nvarchar column named CustomerEmail. You write a query with a literal string:
FROM Customers
WHERE CustomerEmail = ‘user@example.com’
— SQL Server sees: nvarchar column vs varchar literal
— It converts the varchar literal to nvarchar
— This prevents index seek → forces index scan
— On 10M rows: this is the difference between 1ms and 8 seconds
FROM Customers
WHERE CustomerEmail = N’user@example.com’
— N prefix makes this a Unicode nvarchar literal
— Types match: nvarchar column vs nvarchar literal
— SQL Server can use index seek → millisecond response
You can detect implicit conversions in your queries by examining execution plans. Look for the yellow warning triangle icon on operators — hover over it, and if you see “Type conversion in expression may affect CardinalityEstimate” or “CONVERT_IMPLICIT”, you have an implicit conversion problem that needs fixing.
5. The N Prefix — Why One Character Changes Everything
The N prefix before a SQL string literal tells SQL Server to interpret that literal as a Unicode (nvarchar) value rather than a non-Unicode (varchar) value. It is a small syntactic detail with large performance implications.
‘Hello’ — stored as: 48 65 6C 6C 6F (5 bytes)
— With N prefix: nvarchar literal (Unicode UTF-16)
N’Hello’ — stored as: 48 00 65 00 6C 00 6C 00 6F 00 (10 bytes)
— Telugu word “తెలుగు” — only works with N prefix
N’తెలుగు’ — Unicode code points: U+0C24 U+0C46 U+0C32 …
‘తెలుగు’ — Stored incorrectly — garbled data, data loss
The rule is simple: whenever you write a string literal that will be compared against or inserted into an nvarchar column, always prefix it with N. This applies to INSERT statements, WHERE clauses, JOIN conditions, and stored procedure parameters.
6. SSIS Unicode Conversion Error — Causes and Fixes
The error message “Cannot convert between unicode and non-unicode string data types” is the most frequently encountered data type error in SQL Server Integration Services. It appears in the Data Flow tab when source and destination string columns use mismatched encoding types.
Why It Happens
Flat File Sources and Excel Sources in SSIS default to outputting string columns as DT_WSTR — the SSIS Unicode string type. OLE DB Destinations connecting to varchar columns expect DT_STR — the SSIS non-Unicode type. SSIS does not bridge this gap automatically.
Fix 1 — Data Conversion Transformation
[Source] → [Data Conversion] → [Destination]
— Inside Data Conversion Transformation:
Column: CustomerName (DT_WSTR)
Output Alias: CustomerName_Converted
Data Type: DT_STR ← non-Unicode
Length: 100
Code Page: 1252 ← Windows Western European
Fix 2 — Derived Column Expression
(DT_STR, 100, 1252)[CustomerName]
— Replace 100 with your actual max field length
— Replace 1252 with your target code page
Fix 3 — Cast at the SQL Source
SELECT
CAST(CustomerName AS VARCHAR(100)) AS CustomerName,
CAST(EmailAddress AS VARCHAR(200)) AS EmailAddress
FROM Customers
For a broader explanation of the Unicode vs non-Unicode concept beyond SQL Server, see What is Unicode and Non-Unicode? — and for font encoding conversion between Unicode and legacy Indian language fonts, visit the Unicode to Non-Unicode converter tool.
7. SQL Server Unicode Best Practices
- Always use N prefix with string literals compared against nvarchar columns in WHERE, JOIN, and INSERT statements
- Check execution plans regularly for CONVERT_IMPLICIT warnings — each one is a potential index invalidation
- Design consistently — do not mix varchar and nvarchar in columns that will be joined or compared; pick one type per functional area and stick to it
- Avoid deprecated types — replace text with varchar(max) and ntext with nvarchar(max) in any modernization effort
- Set database collation carefully — collation affects how string comparisons work and interacts with Unicode/non-Unicode type behavior
- In SSIS, always add a Data Conversion step when loading flat file or Excel data into varchar columns
- For international applications, default to nvarchar for all user-facing text columns from the start — retrofitting is expensive
// Summary
The Unicode vs non-Unicode distinction in SQL Server is not theoretical — it directly affects query performance, data integrity, and integration reliability. nvarchar uses UTF-16 and supports every language. varchar uses a code page and supports one language at a time.
The most dangerous trap is implicit conversion — a silent performance degradation that can make an indexed query do a full table scan without throwing a single error. The fix is one character: the N prefix on string literals.
Build Unicode-first. Use nvarchar for user-facing text. Monitor execution plans. And use explicit casts whenever you move data between Unicode and non-Unicode systems.