Text_LanguageDetect

Wednesday, January 11, 2006

What I'm working on now

The next version is finished -- I even got it to pass all of the dozens of regression tests even though I completely wrote the parser. I guess the only thing holding me up now is the naming of an object.

Whenever the detector wants to slice-and-dice a piece of text, it instantiates this new object. Should it be called a Sample object (as in a sample of text) or a Parser object? Both imply different things to future development, I think, in the way the object should be used or subclassed.

Also, I'm downloading more training text from wikipedia. I've found that the other-lingual wikipedias are mostly worthless auto-generated text if they have fewer than 1000 articles. Too bad, something tickles me inside about being able to detect Yiddish. I don't think new langauges will make it into the next release.

And finally, the demo page is failing. I don't run the server it's on so I don't know who changed what.

0 Comments:

Post a Comment

<< Home