Front-End Web Strategies

Specifying language information

How HTTP & HTML specify language

  1. HTTP Content-Language header
  2. HTML <meta> equivalent
  3. An HTML lang attribute defined on a parent
  4. A lang attribute on the element
(least to most specific)

Recommendation: specify the primary language in the lang attribute on your <html> element and where it changes:

<html lang="en">

<blockquote lang="la">Lorem ipsum dolor sit amet…</blockquote>

Applying per-language styles

CSS specifies the :lang pseudo-selector which allows your stylesheets to target specific languages:

:lang(zh) { font-family: "Sim Sun"; }

This is better than the older class or attribute selector approach which we had to use when Internet Explorer prior to 8 was dominant:

[lang=zh] { font-family: "Sim Sun"; }

:lang() uses the HTML hierarchy and falls back to the base language, making it much easier to style complex HTML like this without having to carefuly adjust your selectors' specificity:

<div lang="es">
    El Ejemplo:
    <blockquote lang="en-US">
        <q>My hovercraft is full of eels</q> is
        <q lang="is">Svifnökkvinn minn er fullur af álum</q> in Icelandic
        and <q lang="zh-yue">我隻氣墊船裝滿晒鱔</q> in Cantonese
    </blockquote>
</div>

Dealing with mixed LTR and RTL content

Text display can get complicated when you need to mix left-to-right and right-to-left characters. The W3 has a great example:

This example shows an Arabic phrase in the middle of an English sentence. The exclamation point is part of the Arabic phrase and should appear on its left. Because it is between an Arabic and Latin character and the overall paragraph direction is LTR, the bidirectional algorithm positions the exclamation mark to the right of the Arabic phrase.

The title is مفتاح معايير الويب! in Arabic.

Unicode provides several characters to control the displayed text direction. These can be confusing because these characters do not display visually but it's the best option when you want to correct a non-HTML field by adding a LEFT-TO-RIGHT MARK (U+200E) or RIGHT-TO-LEFT MARK (U+200F) – in our example above, we can add an RLM after the exclamation point to tell the text layout engine that it should be included in the RTL text:

The title is مفتاح معايير الويب!‏ in Arabic.

HTML allows you to specify dir="ltr" or dir="rtl". The example above uses this markup: <q lang="ar">; this one uses <q lang="ar" dir="rtl">:

The title is مفتاح معايير الويب! in Arabic.

HTML also specifies the bdo tag which allows you to override direction if you were otherwise going to add a semantically meaningless span tag:

The title is مفتاح معايير الويب! in Arabic.
CSS allows you to set direction and unicode-bidi which is particularly powerful with the :lang pseudo-selector: :lang(ar) { direction: rtl; unicode-bidi: embed; } The title is مفتاح معايير الويب! in Arabic.

Web Fonts

CSS3 @font-face is widely available, and there are now many webfont options available for multi-lingual sites which don't want to use “Arial Unicode MS” for everything. This is particularly powerful because browsers won't download a webfont until it's actually needed and the unicode-range descriptor allows you to target character ranges to specify fonts for particular languages or override specific characters (e.g. low-quality numbers in a great Chinese font)

@font-face {
    font-family: 'syriac_font';

    src: url('../webfont/syriac/syriac-webfont.eot');  #IE fallback
    src: local('Estrangelo Edessa'),
         url('../webfont/syriac/syriac-webfont.eot') format('embedded-opentype'),
         url('../webfont/syriac/syriac-webfont.woff') format('woff');

    unicode-range: U+700-074f;
}
body {
    font-family: sans-serif, 'syriac_font';
}

Ruby

Some Asian language have an annotation system for providing pronunciation or other information which is displayed next to the primary text:

攻殻こうかく機動隊きどうたい

Most browsers have some support for ruby but this varies and not all of the CSS3 Ruby spec is available.

Vertical Text

Some languages should be written vertically rather than horizontally. Support for this on the web has been challenging for a long time – and this is an area where Internet Explorer was years ahead, unfortunately using syntax which is not supported by the draft W3 CSS Writing Modes Level 3 specification.

This is not supported by Firefox prior to version 41 but writing-mode: vertical-lr works in Safari and Chrome when browser-prefixed. Internet Explorer 9+ supports -ms-writing-mode with different values. Be very careful about relying on this feature.

Mongolian text borrowed from a W3 test

ᠬᠤᠯᠤᠭᠠᠨᠠ ᠡᠨᠳᠡ ᠡᠨᠡ ᠭ᠋ᠠᠯ ᠴᠢᠷᠢᠭ ᠬᠠᠨ᠎ᠠ

ᠤᠷᠲᠤ ᠡᠨᠳᠡ ᠡᠨᠡ ᠭ᠋ᠠᠯ ᠴᠢᠷᠢᠭ

/

#