I don't mean to write off the possibility (especially in 2014) that OP could have written (especially hardware-specific) faster C code for the loading process. But I would take that as more of a reason to contribute to the project than to do what's described here.
teleforce•3h ago