There are times I miss being a kid and just doing things.
We run into this issue also where we want to upload volumes of data but don't want to assume access to BCP on every DB server.
We wrote a little utility that actually works pretty fast compared to other methods we've tested (fast=about 1,000,000 rows per minute for a table with 10 random columns with random data), here's the approach:
1-Convert rows into fixed length strings so each row is uploaded as one single varchar column (which makes parsing+execution of SQL stmt during upload much quicker)
2-Repeatedly upload groups of fixed length rows into temp table until all uploaded.
Details:
Multiple fixed length rows are combined into one fixed length varchar column that will be uploaded as one single raw buffer row. We found a buffer size of 15,000 to be the sweet spot.
Multiple threads will each process a subset of source data rows. We found 5 threads to be generally pretty good.
At the end of this step, the destination temp table will have X rows of buffers (the buffer column is just a varchar(15000), and inside each of those buffers are Y source data rows with Z number of columns in fixed format.
3-Once the buffer rows are all uploaded then split out the source data rows+columns using a temp sproc generated for the exact schema (e.g. substring(Buffer_Data,x,y) as Cust_Name)
We wound up staging a local tab delimited file, and importing via bcp:
bcp "$DESTINATION_TABLE" in "$STAGE_FILE.dat" -u -F 2 -c -t'\t'
Not elegant, but it works.https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-...
> Beginning with SQL Server 2017 (14.x), the data_file can be in Azure Blob Storage.
You don't need to have access to the MSSQL box to invoke high performance batch loads. UNC should give you what you need if your IT department is willing to work with it.
Would be nice if MS and Deno could figure things out to get SQL working in Deno.
In the later years it became rather good though and with Python 3 many of the issues with character encoding went away.
For large data transfers — for example, Pandas or Polars DataFrames with millions of rows — performance and reliability are critical. In my experience, fast_executemany in combination with SQLAlchemy helps, but bulk operations via OpenRowSets or BCP are still the most predictable in production, provided the proper permissions are set.
It’s worth noting that even with a new driver, integration complexity often comes from platform differences, TLS/SSL requirements, and corporate IT policies rather than the library itself. For teams looking to simplify workflows, a driver that abstracts these nuances while maintaining control over memory usage and transaction safety would be a strong improvement over rolling your own ODBC setup.
I've not used MSSQL since 2015/2016 and haven't missed much.
Now I live in the OLAP space so I think of it far, far less.
Sure, greenfield does not use MSSQL but there is a ton of companies stuck with MSSQL that will continue to have to fork over big licensing money.
Is this generated by LLM? Comments are straight out of generic LLM slop.
zurfer•4mo ago
zurfer•4mo ago