work
I've been busy lately. No work in LanguageDetect has been going on in the past few days.
I made some speed improvements and a basic speed tester. For longer test strings the difference in speed should only be 5-10% slower than the old version (still slower). However, since this class has been shown to be very accurate even for very short strings, one may speed up detections by truncating test strings to only, say, 300K at most for virtually no loss in accuracy.
I've decided to delay the release until I can figure out why the package is slower for longer text samples.
All remaining bugs have been fixed for the next release, and all of the unit tests new and old pass successfully. All that remains now are minor cleanup tasks, especially in the inline documentation. Expect a release tomorrow.
I cleaned up one more bug....
Looks like there's at least one calculation being done wrong in the new unicode stuff based on the new tests I devised so that's going to hold things up for at least another day.
I've decided on Parser for the new class. This means that a new release of the package will be coming soon.
Sometimes piping long strings of unix commands together is fun.
The next version is finished -- I even got it to pass all of the dozens of regression tests even though I completely wrote the parser. I guess the only thing holding me up now is the naming of an object.
I decided to use blogspot for this blog, because it's a) not going to go out of business like these "free wordpress" sites might, and more importantly b) I don't want to worry about maintaining it.