Linux Exercise: Localization and Internationalization

Timezones

  1. In your homedirectory create an empty file "nowfile" using touch. Check using date and ls -l that this file was indeed created now.
    • # touch nowfile
    • # date
    • # ls -l nowfile
  1. In your .bash_profile, set your TZ variable to "UTC". Logout, login again and check the output of the date command. Also run the ls -l command again.
    • # vi .bash_profile
      export TZ="UTC"
    • # exit
    • Login again
    • # date
    • # ls -l nowfile
  1. Change the .bash_profile again and set the proper timezone. Logout, login and view the results again.
    • # vi .bash_profile
      export TZ="Europe/Amsterdam"
    • # exit
    • Login again
    • # date
    • # ls -l nowfile

Locale

  1. Make a list of the Locales that are currently installed on your system.
    • # locale -a
  1. Check if the Dutch ("nl_NL") locale is on your system. If it is not, install the appropriate language pack.
    • # locale -a | grep nl_NL
    • If not present (typically on a CentOS 8 "minimal" installation):
      # yum list langpacks-*
      # yum install langpacks-nl
  1. Look at the current Locale. Run the date command. What is the result?
    • # locale
    • # date
  1. Switch the Locale to Dutch. Again look at the date command. What is the result?
    • # export LC_ALL="nl_NL"
    • # date
  1. Perform a cat of a non-existent file. In what language is the error message?
    • # cat qwerty

Character encoding

  1. Check the character encoding in your local terminal emulation program (e.g. PuTTy). Make sure it uses UTF-8.
    • For PuTTy: Change Settings: Window; Transation. In the top pull-down menu you can choose the encoding. Select UTF-8. If you want to, you can save this configuration so that it is active the next time you use this PuTTy profile.
  1. Verify that the hexdump command is installed. If it is not, install it. (It is normally part of the util-linux RPM.
    • # which hexdump
    • If hexdump is not installed:
      # yum -y install util-linux
  1. Create a testfile "hello.ascii" with the word "Hello" in it.
    • # echo "Hello" > hello.ascii
  1. Perform an ls -l on this file. What is the size?
    • # ls -l
  1. Look at the file using the file command.
    • # file hello.ascii
  1. Look at the file using the hexdump -C command.
    • # hexdump -C hello.ascii
      The file has a length of six bytes: Five letters (48 65 6C 6C 6F) plus the linefeed character (0A).
  1. Convert the file to UTF-8. What is the size of the file? What is the type? What is the content?
    • # iconv -f ascii -t utf8 hello.ascii -o hello.utf8
    • # ls -l hello.utf8
    • # file hello.utf8
    • # hexdump -C hello.utf8
      If a file only contains characters from the original 7-bit ASCII set, then there is no difference between ASCII and UTF-8, as UTF-8 is downwards compatible with ASCII.
  1. Convert the file to ISO-8859-1. What is the size of the file? What is the type? What is the content?
    • # iconv -f ascii -t iso_8859-1 hello.ascii -o hello.iso
    • # ls -l hello.iso
    • # file hello.iso
    • # hexdump -C hello.iso
      The ISO-8859-1 is also compatible with ASCII, because it only defines the upper 128 characters in a character set. The lower 128 characters of ISO-8859-1 are the ASCII characters. So if your file only contains ASCII characters, there is no difference between ASCII and ISO-8859-1.
  1. Convert the file to UTF-16. What is the size of the file? What is the type? What is the content?
    • # iconv -f ascii -t utf16 hello.ascii -o hello.utf16
    • # ls -l hello.utf16
    • # file hello.utf16
    • # hexdump -C hello.utf16
      UTF-16 encoding is not downwards compatible with ASCII. UTF-16 converts most characters to 16 bits (2 bytes), while some are converted to 32 bits (4 bytes). Also, UTF-16 always starts with a Byte Order Marker (BOM, FF FE) which allows you to distinguish between Least-Significant-Byte-First and Most-Significant-Byte-First encodings. UTF-16 is not used often.
  1. Check whether the hexedit command is installed. If it is not, install it. This is usually a standalone package.
    • # which hexedit
    • If it is not installed:
      # yum -y install hexedit
  1. Create a file hello2.iso with contents "Hello". Use the hexedit command to edit this file: You are going to replace the second character ("e", or ASCII character 65), with the letter "é" (ISO-8859-1 character E9). So after editing file should contain these six bytes: 48 E9 6C 6C 6F 0A.
    • # echo "Hello" > hello2.iso
    • # hexedit hello2.iso
      In hexedit you type the hexadecimal code of the bytes you will want in your file. The left column is the position in the file, the middle column is the hexadecimal representation, and the right column shows the ASCII representation - if available. Use Ctrl-X to save the file and exit hexedit.
  1. Look at the length of the file, the type of the file and the contents of the file. What happens?
    • # ls -l hello2.iso
    • # file hello2.iso
    • # hexdump -C hello2.iso
    • # cat hello2.iso
      Depending on the settings of your terminal emulator, the file may or may not be displayed properly. If necessary, adjust the settings of your SSH client so that the file is displayed properly.
  1. Convert the file to UTF-8. Look at the results.
    • # iconv -f iso_8859-1 -t utf8 hello2.iso -o hello2.utf8
    • # ls -l hello2.utf8
    • # file hello2.utf8
    • # hexdump -C hello2.utf8
      In UTF-8 the é character is represented as a 2-byte sequence: hexadecimal C3 A9.
    • # cat hello2.utf8
      Depending on the settings of your terminal emulator, the file may or may not be displayed properly. If necessary, adjust the settings of your SSH client so that the file is displayed properly.
  1. Convert the file to UTF-16. View the length, type and contents.
    • # iconv -f iso_8859-1 -t utf16 hello2.iso -o hello2.utf16
    • # ls -l hello2.utf16
    • # file hello2.utf16
    • # hexdump -C hello2.utf16
    • # cat hello2.utf16
      Depending on the settings of your terminal emulator, the file may or may not be displayed properly. If necessary, adjust the settings of your SSH client so that the file is displayed properly. Note that PuTTy doesn't have support for UTF-16 built-in.
  1. Try to convert the file to ASCII. Does this work?
    • # iconv -f iso_8859-1 -t ascii hello2.iso -o hello2.ascii
      This command will fail, because there is no ASCII representation of the character é.
  1. Try to convert the file to ASCII again, but this time use the //translit option of iconv (see the manual page) to convert characters that cannot be converted literally, to a similar character. Note that what a "similar" character is, is defined by your Locale. So make sure your Locale is set to, for instance, Dutch.
    • # export LC_ALL="nl_NL"
    • # iconv -f iso_8859-1 -t ascii//translit hello2.iso -o hello2.ascii
    • # ls -l hello2.ascii
    • # file hello2.ascii
    • # hexdump -C hello2.ascii
      You can see that the é character has been translated to a regular e: hexadecimal character 65.
    • # cat hello2.ascii
  1. (Optional) Repeat the previous steps, but this time use the Euro-symbol (€). The Euro-symbol has Unicode code point U+20AC and is UTF-8 encoded as E2 82 AC (three bytes).
    The Euro symbol is not part of ISO-8859 codepage 1, so this conversion will go wrong. But it is part of the ISO-8859 codepage 15, where it has hexadecimal value A4. To view the symbol correctly if you view the euro.iso file with cat, you will have to set your SSH client to use ISO-8859-15. If you don't do this, then the symbol will be shown as the universal currency sign, which looks like a square with wings. (This is character A4 in ISO-8859-1, and has Unicode code point U+00A4.)
End of exercise