I think in a language with a lot of similar sounds or even homophones, longer words are easier. For a beginner Chinese speaker that knows both words, hearing "chē" will probably be ambiguous, but "chūzūchē" will be parsed immediately.
I don’t think the ‘longer equals harder’ pattern holds for every language. I actually reached out to the head teacher at CIJ when I first made this analysis and she said the same.
Much of the beginner videos make use of visual hints like you say (images, props, etc), and none of these were taken into account in my analysis.
I do think it could be cool to do a 'visual' analysis of CI in the future where you attempt to measure how much context is present (or not) in each video and see what insights you could draw from that.
I will note that the transcripts (and parsing scripts) are not included in the repo. The transcripts are not my intellectual property so I can't share it (and the parsing scripts are a bit of a dumpster fire).
joshdavham•6h ago
Happy to answer any questions here. I kept my analysis really high level for a general audience but since this is HN, we can get a bit nerdy :D