Rendered at 04:31:12 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
avidiax 27 minutes ago [-]
If you like this, you would probably enjoy Princeton Wordnet. They have unfortunately stopped developing it.
You can still browse it a bit online with some 3rd party sites: https://en-word.net/
castral 2 days ago [-]
It's an interesting visualization for sure, but I don't really know what I can take away from it. Is it useful for something?
h4ch1 2 days ago [-]
You can look at this as how small sets of a primitive lexicon give rise to a larger, more complex language. At least that's how I interpret it.
rhelz 2 days ago [-]
Beautiful! Thank you!
readthenotes1 50 minutes ago [-]
Is, be, and the don't show up in search box.
What am I missing?
Cyphase 26 minutes ago [-]
Other words too, e.g. "from".
My first thought was that the creator used a search library that filters common words by default, but the search code is all in the page and doesn't do that.
My second thought was that the 10k word corpus doesn't include those most common words. But it does.
Then I realized that the creator filtered them out. The page does say "7931 words", and the title here on HN says "10k* most common". The original corpus has exactly 10,000 words.
You can still browse it a bit online with some 3rd party sites: https://en-word.net/
What am I missing?
My first thought was that the creator used a search library that filters common words by default, but the search code is all in the page and doesn't do that.
My second thought was that the 10k word corpus doesn't include those most common words. But it does.
Then I realized that the creator filtered them out. The page does say "7931 words", and the title here on HN says "10k* most common". The original corpus has exactly 10,000 words.
https://github.com/first20hours/google-10000-english/blob/d0...
The first 21 include all four we've mentioned:
the, of, and, to, a, in, for, is, on, that, by, this, with, i, you, it, not, or, be, are, from