Unicode vs Non-Unicode in SQL Server — Developer’s Complete Guide

Unicode vs Non-Unicode in SQL Server — Developer’s Complete Guide
Developer Guide — SQL Server

Unicode vs Non-Unicode
in SQL Server
A Developer’s Complete Reference

Data type mismatches between Unicode and non-Unicode columns are behind some of the most common and costly SQL Server performance problems. This guide explains the difference, the traps, and the fixes.

// 11 min read // SQL Server // SSIS // Performance

SQL Server stores text in two completely different ways depending on whether the column is defined as a Unicode or non-Unicode type. The gap between these two approaches is not just about character capacity — it affects storage size, query plan selection, index utilization, interface compatibility, and the behavior of your entire data integration pipeline.

Developers who do not understand this distinction often introduce performance bugs that only manifest at scale — slow queries that run fine on small tables but become painfully expensive as data grows. This guide makes the distinction concrete with real examples.

1. Why Encoding Matters Inside SQL Server

When you store the string “Hyderabad” in a SQL Server column, the database does not store the letters — it stores numbers. The encoding determines which numbers. A non-Unicode column stores one byte per character using the database’s configured code page. A Unicode column stores two bytes per character using UTF-16 encoding.

For English text, both systems produce readable results — but they are storing fundamentally different byte sequences. When you mix these two types in queries, joins, or comparisons, SQL Server has to resolve the mismatch — and that resolution has real performance costs.

SQL Server uses UTF-16 LE (Little Endian) encoding for all Unicode string storage — the same encoding used internally by the Windows operating system and the .NET runtime. This means nvarchar columns store exactly 2 bytes per BMP character and 4 bytes for supplementary characters outside the Basic Multilingual Plane.

2. Complete Unicode vs Non-Unicode Data Type Reference

Data TypeUnicode?LengthStorageMax SizeStatus
char(n)Non-UnicodeFixed1 byte/char8,000 bytesActive
varchar(n)Non-UnicodeVariable1 byte/char8,000 bytesActive
varchar(max)Non-UnicodeVariable1 byte/char2 GBActive
textNon-UnicodeVariable1 byte/char2 GB⚠ Deprecated
nchar(n)UnicodeFixed2 bytes/char8,000 bytesActive
nvarchar(n)UnicodeVariable2 bytes/char8,000 bytesActive
nvarchar(max)UnicodeVariable2 bytes/char2 GBActive
ntextUnicodeVariable2 bytes/char2 GB⚠ Deprecated

The “n” prefix in nvarchar and nchar stands for National Character — terminology borrowed directly from the ISO/IEC SQL standard. It signals Unicode storage. The deprecated types (text and ntext) should not be used in new code — use varchar(max) and nvarchar(max) instead.

3. nvarchar vs varchar — The Practical Decision

The core question every database designer faces is: when should a column be nvarchar versus varchar? The answer depends on three factors: what characters the data actually contains, where the data comes from, and how much storage you can afford.

Use nvarchar (Unicode) when:

  • The column will store text in any non-Latin script — Arabic, Chinese, Japanese, Hindi, Telugu, Bengali, etc.
  • The data comes from user input via a web form, mobile app, or modern API (all Unicode sources)
  • The column stores names, addresses, or descriptions that users might enter in any language
  • The data will be exported to web services, JSON, or XML (all Unicode formats)
  • You are building a multilingual application or SaaS product

Use varchar (Non-Unicode) when:

  • The column stores strictly ASCII data — codes, identifiers, postal codes, URLs, email addresses
  • Storage optimization is critical and you can guarantee the character set will never expand
  • The column stores internal system values that users never input directly
  • You are interfacing with a legacy system that only sends single-byte encoded data
When in doubt, use nvarchar. The storage cost — roughly double compared to varchar for ASCII content — is almost always worth the flexibility and future-proofing. Storage is cheap; data corruption from encoding mismatches is not.

4. Implicit Conversion — The Hidden Performance Killer

This is the most important section in this guide for SQL Server performance. Implicit conversion happens when SQL Server automatically converts one data type to another to resolve a comparison or join — and it happens silently, without warning, in your query execution plan.

Here is the scenario: you have a table with an nvarchar column named CustomerEmail. You write a query with a literal string:

SQLSLOW — Causes Implicit Conversion
SELECT CustomerID, FirstName, LastName
FROM Customers
WHERE CustomerEmail = ‘user@example.com’

— SQL Server sees: nvarchar column vs varchar literal
— It converts the varchar literal to nvarchar
— This prevents index seek → forces index scan
— On 10M rows: this is the difference between 1ms and 8 seconds
SQLFAST — No Implicit Conversion
SELECT CustomerID, FirstName, LastName
FROM Customers
WHERE CustomerEmail = N’user@example.com’

— N prefix makes this a Unicode nvarchar literal
— Types match: nvarchar column vs nvarchar literal
— SQL Server can use index seek → millisecond response
Implicit conversion between Unicode and non-Unicode types always goes from non-Unicode to Unicode — because Unicode has higher data type precedence in SQL Server’s type hierarchy. This means varchar values are silently cast to nvarchar. The cast itself is not the problem — the problem is that this cast happens on the column being searched, which invalidates the index and forces a full scan.

You can detect implicit conversions in your queries by examining execution plans. Look for the yellow warning triangle icon on operators — hover over it, and if you see “Type conversion in expression may affect CardinalityEstimate” or “CONVERT_IMPLICIT”, you have an implicit conversion problem that needs fixing.

5. The N Prefix — Why One Character Changes Everything

The N prefix before a SQL string literal tells SQL Server to interpret that literal as a Unicode (nvarchar) value rather than a non-Unicode (varchar) value. It is a small syntactic detail with large performance implications.

SQLUnderstanding the N Prefix
— Without N prefix: varchar literal (non-Unicode)
‘Hello’ — stored as: 48 65 6C 6C 6F (5 bytes)

— With N prefix: nvarchar literal (Unicode UTF-16)
N’Hello’ — stored as: 48 00 65 00 6C 00 6C 00 6F 00 (10 bytes)

— Telugu word “తెలుగు” — only works with N prefix
N’తెలుగు’ — Unicode code points: U+0C24 U+0C46 U+0C32 …
‘తెలుగు’ — Stored incorrectly — garbled data, data loss

The rule is simple: whenever you write a string literal that will be compared against or inserted into an nvarchar column, always prefix it with N. This applies to INSERT statements, WHERE clauses, JOIN conditions, and stored procedure parameters.

6. SSIS Unicode Conversion Error — Causes and Fixes

The error message “Cannot convert between unicode and non-unicode string data types” is the most frequently encountered data type error in SQL Server Integration Services. It appears in the Data Flow tab when source and destination string columns use mismatched encoding types.

Why It Happens

Flat File Sources and Excel Sources in SSIS default to outputting string columns as DT_WSTR — the SSIS Unicode string type. OLE DB Destinations connecting to varchar columns expect DT_STR — the SSIS non-Unicode type. SSIS does not bridge this gap automatically.

Fix 1 — Data Conversion Transformation

SSISRecommended Fix
— In the Data Flow:
[Source] → [Data Conversion] → [Destination]

— Inside Data Conversion Transformation:
Column: CustomerName (DT_WSTR)
Output Alias: CustomerName_Converted
Data Type: DT_STR ← non-Unicode
Length: 100
Code Page: 1252 ← Windows Western European

Fix 2 — Derived Column Expression

SSIS ExpressionDerived Column Method
— Cast Unicode DT_WSTR to non-Unicode DT_STR
(DT_STR, 100, 1252)[CustomerName]

— Replace 100 with your actual max field length
— Replace 1252 with your target code page

Fix 3 — Cast at the SQL Source

SQLSource-Level Cast
— Change OLE DB Source to SQL Command mode and cast:
SELECT
CAST(CustomerName AS VARCHAR(100)) AS CustomerName,
CAST(EmailAddress AS VARCHAR(200)) AS EmailAddress
FROM Customers
RELATED READING

For a broader explanation of the Unicode vs non-Unicode concept beyond SQL Server, see What is Unicode and Non-Unicode? — and for font encoding conversion between Unicode and legacy Indian language fonts, visit the Unicode to Non-Unicode converter tool.

7. SQL Server Unicode Best Practices

  • Always use N prefix with string literals compared against nvarchar columns in WHERE, JOIN, and INSERT statements
  • Check execution plans regularly for CONVERT_IMPLICIT warnings — each one is a potential index invalidation
  • Design consistently — do not mix varchar and nvarchar in columns that will be joined or compared; pick one type per functional area and stick to it
  • Avoid deprecated types — replace text with varchar(max) and ntext with nvarchar(max) in any modernization effort
  • Set database collation carefully — collation affects how string comparisons work and interacts with Unicode/non-Unicode type behavior
  • In SSIS, always add a Data Conversion step when loading flat file or Excel data into varchar columns
  • For international applications, default to nvarchar for all user-facing text columns from the start — retrofitting is expensive

// Summary

The Unicode vs non-Unicode distinction in SQL Server is not theoretical — it directly affects query performance, data integrity, and integration reliability. nvarchar uses UTF-16 and supports every language. varchar uses a code page and supports one language at a time.

The most dangerous trap is implicit conversion — a silent performance degradation that can make an indexed query do a full table scan without throwing a single error. The fix is one character: the N prefix on string literals.

Build Unicode-first. Use nvarchar for user-facing text. Monitor execution plans. And use explicit casts whenever you move data between Unicode and non-Unicode systems.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top