When opening a text file using Plywood, you have a couple of options: Plywood is a cross-platform open-source C++ framework I released two months ago. And even if a text file is encoded in UTF-8, there are still variations in format, since the file may or may not start with a BOM and could use either UNIX-style or Windows-style line endings. In other words, the ambiguity problem still exists today. When writing a text file from Python, the default encoding is platform-dependent on my Windows PC, it’s Windows-1252. The Windows Registry editor, for example, still saves text files as UTF-16.
UTF-8 hasn’t taken over the world just yet, though. It’s impressive how quickly that number has changed it was less than 10% as recently as 2006. More than 95% of the Internet is now delivered using UTF-8. Fortunately, the text file landscape has gotten simpler over time, with UTF-8 winning out over other character encodings. It’s a problem that has been around for a while.
#Windows text encoding software
This poses a challenge to software that loads text. That’s obviously an artificial example, but the point is that text files are inherently ambiguous.