Use more tags to help the ICU detector.
The detector only gave non-ascii data to ICU. In some cases that could
result in very short data, for which ICU would issue a low confidence
level for the actual encoding. By padding the data with additional
(ascii) tags, we improve accuracy for such files. Becauses this can
reduce accuracy in other cases, only do this when the initial confidence
is low.
b/13473604
Change-Id: I63d932043155c310b0e358cdf2d37787961e94b7
diff --git a/media/libmedia/CharacterEncodingDetector.h b/media/libmedia/CharacterEncodingDetector.h
index 3655a91..7b5ed86 100644
--- a/media/libmedia/CharacterEncodingDetector.h
+++ b/media/libmedia/CharacterEncodingDetector.h
@@ -41,7 +41,9 @@
private:
const UCharsetMatch *getPreferred(
- const char *input, size_t len, const UCharsetMatch** ucma, size_t matches);
+ const char *input, size_t len,
+ const UCharsetMatch** ucma, size_t matches,
+ bool *goodmatch);
bool isFrequent(const uint16_t *values, uint32_t c);