C# strings silently kill your SQL Server indexes in Dapper

https://consultwithgriff.com/dapper-nvarchar-implicit-conversion-performance-trap

59•PretzelFisch•3h ago

Comments

wvenable•3h ago

This really doesn't have anything to do with C#. This is your classic nvarchar vs varchar issue (or unicode vs ASCII). The same thing happens if you mix collations.

I'm not sure why anyone would choose varchar for a column in 2026 unless if you have some sort of ancient backwards compatibility situation.

beart•3h ago

I agree with your first point. I've seen this same issue crop up in several other ORMs.

As to your second point. VARCHAR uses N + 2 bytes where as NVARCHAR uses N*2 + 2 bytes for storage (at least on SQL Server). The vast majority of character fields in databases I've worked with do not need to store unicode values.

_3u10•3h ago

Generally if it stores user input it needs to support Unicode. That said UTF-8 is probably a way better choice than UTF-16/UCS-2

SigmundA•2h ago

UTF-8 is a relatively new thing in MSSQL and had lots of issues initially, I agree it's better and should have been implemented in the product long ago.

I have avoided it and have not followed if the issues are fully resolved, I would hope they are.

kstrauser•2h ago

> UTF-8 is a relatively new thing in MSSQL and had lots of issues initially, I agree it's better and should have been implemented in the product long ago.

Their insistence on making the rest of the world go along with their obsolete pet scheme would be annoying if I ever had to use their stuff for anything ever. UTF-8 was conceived in 1992, and here we are in 2026 with a reasonably popularly database still considering it the new thing.

wvenable•3h ago

> The vast majority of character fields in databases I've worked with do not need to store unicode values.

This has not been my experience at all. Exactly the opposite, in fact. ASCII is dead.

SigmundA•2h ago

Vast majority of text fields I see are coded values that are perfectly fine using ascii, but I deal mostly with English language systems.

Text fields that users can type into directly especially multiline tend to need unicode but they are far fewer.

simonask•1h ago

English has plenty of Unicode — claiming otherwise is such a cliché…

Unicode is a requirement everywhere human language is used, from Earth to the Boöotes Void.

NegativeLatency•1h ago

Also less awkward to make it right the first time, instead of explaining why someone can’t type their name or an emoji

psidebot•1h ago

Some examples of coded fields that may be known to be ascii: order name, department code, business title, cost center, location id, preferred language, account type…

SigmundA•2h ago

To complicate matters SQL Server can do Nvarchar compression, but they should have just done UTF-8 long ago:

https://learn.microsoft.com/en-us/sql/relational-databases/d...

Also UTF-8 is actually just a varchar collation so you don't use nvarchar with that, lol?

SigmundA•2h ago

Yes I have run into this regardless of client language and I consider it a defect in the optimizer.

wvenable•2h ago

I wouldn't consider it a defect in the optimizer; it's doing exactly what it's told to do. It cannot convert an nvarchar to varchar -- that's a narrowing conversion. All it can do is convert the other way and lose the ability to use the index. If you think that there is no danger converting an nvarchar that contains only ASCII to varchar then I have about 70+ different collations that say otherwise.

applfanboysbgon•2h ago

I think this is a rather pertinent showcase of the danger of outsourcing your thinking to LLMs. This article strongly indicates to me that it is LLM-written, and it's likely the LLM diagnosed the issue as being a C# issue. When you don't understand the systems you're building with, all you can do is take the plausible-sounding generated text about what went wrong for granted, and then I suppose regurgitate it on your LLM-generated portfolio website in an ostensible show of your profound architectural knowledge.

cosmez•2h ago

This is a common issue, and most developers I worked with are not aware of it until they see the performance issues.

Most people are not aware of how Dapper maps types under the hood; once you know, you start being careful about it.

Nothing to do with LLMs, just plain old learning through mistakes.

keithnz•1h ago

actually, LLMs do way better, with dapper the LLM generates code to specify types for strings

ziml77•1h ago

This is not at all just an LLM thing. I've been working with C# and MS SQL Server for many years and never even considered this could be happening when I use Dapper. There's likely code I have deployed running suboptimally because of this.

And it's not like I don't care about performance. If I see a small query taking more than a fraction of a second when testing in SSMS or If I see a larger query taking more than a few seconds I will dig into the query plan and try to make changes to improve it. For code that I took from testing in SSMS and moved into a Dapper query, I wouldn't have noticed performance issues from that move if the slowdown was never particularly large.

dspillett•2h ago

> I'm not sure why anyone would choose varchar for a column in 2026

The same string takes roughly half the storage space, meaning more rows per page and therefore a smaller working set needed in memory for the same queries and less IO. Also, any indexes on those columns will also be similarly smaller. So if you are storing things that you know won't break out of the standard ASCII set⁰, stick with [VAR]CHARs¹, otherwise use N[VAR]CHARs.

Of course if you can guarantee that your stuff will be used on recent enough SQL Server versions that are configured to support UTF8 collations, then default to that instead unless you expect data in a character set where that might increase the data size over UTF16. You'll get the same size benefit for pure ASCII without losing wider character set support.

Furthermore, if you are using row or page compression it doesn't really matter: your wide-character strings will effectively be UTF8 encoded anyway. But be aware that there is a CPU hit for processing compressed rows and pages every access because they remain compressed in memory as well as on-disk.

--------

[0] Codes with fixed ranges, etc.

[1] Some would say that the other way around, and “use NVARCHAR if you think there might be any non-ASCIII characters”, but defaulting to NVARCHAR and moving to VARCHAR only if you are confident is the safer approach IMO.

paulsutter•1h ago

Utf8 solved this completely. It works with any length unicode and on average takes up almost as little storage as ascii.

Utf16 is brain dead and an embarrassment

wvenable•1h ago

Blame the Unicode consortium for not coming up UTF-8 first (or, really, at all). And for assuming that 65526 code points would be enough for everyone.

So many problems could be solved with a time machine.

kstrauser•16m ago

The first draft of Unicode was in 1988. Thompson and Pike came up with UTF-8 in 1992, made an RFC in 1998. UTF-16 came along in 1996, made an RFC in 2000.

The time machine would've involved Microsoft saying "it's clear now that USC-2 was a bad idea, so let's start migrating to something genuinely better".

jiggawatts•3h ago

This feels like a bug in the SQL query optimizer rather than Dapper.

It ought to be smart enough to convert a constant parameter to the target column type in a predicate constraint and then check for the availability of a covering index.

wvenable•3h ago

It's the optimizer caching the query plan as a parameterized query. It's not re-planning the index lookup on every execution.

SigmundA•3h ago

The parameter type is part of the cache identity, nvarchar and varchar would have two cache entries with possibly different plans.

valiant55•2h ago

There's a data type precedence that it uses to determine which value should be casted[0]. Nvarchar is higher precedence, therefore the varchar value is "lifted" to an nvarchar value first. This wouldn't be an issue if the types were reversed.

0: https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-...

beart•2h ago

How do you safely convert a 2 byte character to a 1 byte character?

jiggawatts•2h ago

Easily! If it doesn't convert successfully because it includes characters outside of the range of the target codepage then the equality condition is necessarily false, and the engine should short-circuit and return an empty set.

enord•3h ago

This is due to utf-16, an unforgivable abomination.

adzm•2h ago

even better is Entity Framework and how it handles null strings by creating some strange predicates in SQL that end up being unable to seek into string indexes

smithkl42•2h ago

Been bit by that before: it's not just an issue with Dapper, it can also hit you with Entity Framework.

andrelaszlo•2h ago

I thought, having just read the title, that maybe it's time to upgrade if you're still on Ubuntu 6.06.

briHass•2h ago

I've found and fixed this bug before. There are 2 other ways to handle it

Dapper has a static configuration for things like TypeMappers, and you can change the default mapping for string to use varchar with: Dapper.SqlMapper.AddTypeMap(typeof(string),System.Data.DbType.AnsiString). I typically set that in the app startup, because I avoid NVARCHAR almost entirely (to save the extra byte per character, since I rarely need anything outside of ANSI.)

Or, one could use stored procedures. Assuming you take in a parameter that is the correct type for your indexed predicate, the conversion happens once when the SPROC is called, not done by the optimizer in the query.

I still have mixed feelings about overuse of SQL stored procedures, but this is a classic example of where on of their benefits is revealed: they are a defined interface for the database, where DB-specific types can be handled instead of polluting your code with specifics about your DB.

(This is also a problem for other type mismatches like DateTime/Date, numeric types, etc.)

ziml77•1h ago

Sprocs are how I handle complex queries rather than embedding them in our server applications. It's definitely saved me from running into problems like this. And it comes with another advantage of giving DBAs more control to manage performance (DBAs do not like hearing that they can't take care of a performance issue that's cropped up because the query is compiled into an application)

maciekkmrk•54m ago

Interesting problem, but the AI prose makes me not want to read to the end.

diath•52m ago

It's weird that the article does not show any benchmarks but crappy descriptions like "milliseconds to microseconds" and "tens of thousands to single digits". This is the kind of vague performance description LLMs like to give when you ask them about performance differences between solutions and don't explicitly ask for a benchmark suite.

mvdtnz•15m ago

This is a really interesting blog post - the kind of old school stuff the web used to be riddled with. I must say - would it have been that hard to just write this by hand? The AI adds nothing here but the same annoying old AI-isms that distract from the piece.

Vibe Security Radar – Tracking the security cost of vibe coding

Spark Runner: Easily Automate Front End Tests

I built this privacy-focused analytics tool

"Game Development in Eight Bits" by Kevin Zurawel (2021) [video]

open_slate: A Powerful and Private 2-in-1 Tablet

Converting Binary Floating-Point Numbers to Shortest Decimal Strings

The era of Doctor AI is here

Show HN: Context-compact – Summarize agent context instead of truncating it

Coding Agents in Feb 2026

Calif. lawsuit accuses Meta of sending nude video from AI glasses to workers

Anthropic and The Pentagon

Show HN: Crypto data API where AI agents pay per request with USDC (x402)

The first AI counter surveillance app

Loop Conference Channel [YouTube]

The Mystery of Asjo.org

How College Admissions Officers Spot Over-Coached Applications

Our Hospice System Subverts the Point of Hospice Care

SEIU Delenda Est

Tell HN: Azure Data Factory pipeline execution delays in East US 2

Show HN: ByeBrief – a local-first AI investigation canvas

The Differentiated Engineer in the Era of Automated Development

Defense Devaluation – Starlink on American Drones

India Plans 30% Slash in Thermal Coal Imports This Year

I made a programming language with M&Ms

Show HN: MysteryMaker AI

Peer-to-Peer Networking: Build a VPN Tunnel with Wintun on Windows – Part 2

UUID package coming to Go standard library

US draws up strict new AI guidelines amid Anthropic clash

T3 Code – a new OSS agentic coding app that wraps Codex

Show HN: HyperClaw – self-hosted AI assistant that replies on Telegram/Discord/+

Vibe Security Radar – Tracking the security cost of vibe coding

Spark Runner: Easily Automate Front End Tests

I built this privacy-focused analytics tool

"Game Development in Eight Bits" by Kevin Zurawel (2021) [video]

open_slate: A Powerful and Private 2-in-1 Tablet

Converting Binary Floating-Point Numbers to Shortest Decimal Strings

The era of Doctor AI is here

Show HN: Context-compact – Summarize agent context instead of truncating it

Coding Agents in Feb 2026

Calif. lawsuit accuses Meta of sending nude video from AI glasses to workers

Anthropic and The Pentagon

Show HN: Crypto data API where AI agents pay per request with USDC (x402)

The first AI counter surveillance app

Loop Conference Channel [YouTube]

The Mystery of Asjo.org

How College Admissions Officers Spot Over-Coached Applications

Our Hospice System Subverts the Point of Hospice Care

SEIU Delenda Est

Tell HN: Azure Data Factory pipeline execution delays in East US 2

Show HN: ByeBrief – a local-first AI investigation canvas

The Differentiated Engineer in the Era of Automated Development

Defense Devaluation – Starlink on American Drones

India Plans 30% Slash in Thermal Coal Imports This Year

I made a programming language with M&Ms

Show HN: MysteryMaker AI

Peer-to-Peer Networking: Build a VPN Tunnel with Wintun on Windows – Part 2

UUID package coming to Go standard library

US draws up strict new AI guidelines amid Anthropic clash

T3 Code – a new OSS agentic coding app that wraps Codex

Show HN: HyperClaw – self-hosted AI assistant that replies on Telegram/Discord/+

C# strings silently kill your SQL Server indexes in Dapper

Comments