For v1, it's pretty much a model pipeline: OCR current text -> generate mask -> erase text -> translate text -> use embedding comparison to find similar font -> map text back on image
v1 was more like a prototype which already beats many of the similar services provided by Google, Azure, etc
We're working on v2 where we're training a diffusion model to translate the text on the image. We've got the pipeline working for English and Chinese, and now we're building datasets for other languages.
lunaps•20h ago