) and fonts in digital malayalam - wordpress.com...text/data/പ ഠ /വ വര •...

115
Text (Data) and Fonts in digital Malayalam Rajeesh K V ഒൿേѯാബർ 31, 2019 cto, T E XByte Solutions 1

Upload: others

Post on 20-Oct-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

  • Text (Data) and Fonts in digital Malayalam

    Rajeesh K Vഒൿേ ാബർ 31, 2019

    cto, TEXByte Solutions

    1

  • About the speaker

    • Entrepreneur• erp software architect• Free (സ ത ) software developer & user• Programmer• Jack of many trades, master of some

    [email protected]

    cbn © 2019 Rajeesh K V 2

  • Data v/s Presentation

  • Text/Data/പാഠം/വിവരം

    • Computers don’t recognize ‘A’ or ‘ക’. They know only 0x41or 0x0D15 instead

    • Text/data are stored as ‘code points’ (binary dataaccording to certain agreed ‘standard’) — ‘encoding’

    • ASCII: 0x41→ ‘A’, 0x42→ ‘B’,… but — no code for ‘ക’!

    3

  • Text/Data/പാഠം/വിവരം

    • Computers don’t recognize ‘A’ or ‘ക’. They know only 0x41or 0x0D15 instead

    • Text/data are stored as ‘code points’ (binary dataaccording to certain agreed ‘standard’) — ‘encoding’

    • ASCII: 0x41→ ‘A’, 0x42→ ‘B’,… but — no code for ‘ക’!

    3

  • Text/Data/പാഠം/വിവരം

    • Computers don’t recognize ‘A’ or ‘ക’. They know only 0x41or 0x0D15 instead

    • Text/data are stored as ‘code points’ (binary dataaccording to certain agreed ‘standard’) — ‘encoding’

    • ASCII: 0x41→ ‘A’, 0x42→ ‘B’,… but —

    no code for ‘ക’!

    3

  • Text/Data/പാഠം/വിവരം

    • Computers don’t recognize ‘A’ or ‘ക’. They know only 0x41or 0x0D15 instead

    • Text/data are stored as ‘code points’ (binary dataaccording to certain agreed ‘standard’) — ‘encoding’

    • ASCII: 0x41→ ‘A’, 0x42→ ‘B’,… but — no code for ‘ക’!

    3

  • ASCII — Data

    t e x t ←− ASCII (encoding)

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74 ←− data

    4

  • ASCII — Presentation

    t e x t ←− font↑ ↑ ↑ ↑

    t e x t ←− ASCII

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74 ←− data

    5

  • ASCII — Presentation

    t e x t ←− font↑ ↑ ↑ ↑

    t e x t ←− ASCII

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74 ←− data

    6

  • ASCII — Presentation

    Change the font, and…

    t e x t ← font→↑ ↑ ↑ ↑

    t e x t ← ASCII→

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74 ← data→

    t e x t↑ ↑ ↑ ↑

    t e x t

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74

    7

  • Font

    8

  • Font

    9

  • ASCII chart

    Source — Shawn Hymel 10

  • Malayalam

  • Code point for Malayalam

    • No code points for Malayalam characters in ASCII!

    • Then how did ISM and PageMaker work?

    or

    What you see is not what you have

    11

  • Code point for Malayalam

    • No code points for Malayalam characters in ASCII!• Then how did ISM and PageMaker work?

    or

    What you see is not what you have

    11

  • Code point for Malayalam

    • No code points for Malayalam characters in ASCII!• Then how did ISM and PageMaker work?

    or

    What you see is not what you have

    11

  • Code point for Malayalam

    • No code points for Malayalam characters in ASCII!• Then how did ISM and PageMaker work?

    or

    What you see is not what you have

    11

  • ASCII font glyphs — ML TT Revathi

    12

  • ISCII Malayalam

    t e x t ← font→↑ ↑ ↑ ↑

    t e x t ← ASCII→

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74 ← data→

    I¯ p I↑ ↑ ↑ ↑

    t e x t

    ↑ ↑ ↑ ↑

    0x74 0x65 0x780x74

    13

  • ISCII Malayalam

    Change the font, and…

    t e x t ← font→↑ ↑ ↑ ↑

    t e x t ← ASCII→

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74 ← data→

    ç « õ ç↑ ↑ ↑ ↑

    t e x t

    ↑ ↑ ↑ ↑

    0x74 0x65 0x78 0x74

    14

  • Source – Ashok Kumar P K

    15

  • ASCII — problems

    • Only 8 bits (1 byte) to represent any character — at most256 characters

    • Text/Data (പാഠം) is still Latin, not Malayalam• Information interchange — ⟨document + font⟩• Sorting (അകാരാദി മം)• Searching (information retrieval)

    16

  • ASCII — problems

    • Only 8 bits (1 byte) to represent any character — at most256 characters

    • Text/Data (പാഠം) is still Latin, not Malayalam

    • Information interchange — ⟨document + font⟩• Sorting (അകാരാദി മം)• Searching (information retrieval)

    16

  • ASCII — problems

    • Only 8 bits (1 byte) to represent any character — at most256 characters

    • Text/Data (പാഠം) is still Latin, not Malayalam• Information interchange — ⟨document + font⟩

    • Sorting (അകാരാദി മം)• Searching (information retrieval)

    16

  • ASCII — problems

    • Only 8 bits (1 byte) to represent any character — at most256 characters

    • Text/Data (പാഠം) is still Latin, not Malayalam• Information interchange — ⟨document + font⟩• Sorting (അകാരാദി മം)

    • Searching (information retrieval)

    16

  • ASCII — problems

    • Only 8 bits (1 byte) to represent any character — at most256 characters

    • Text/Data (പാഠം) is still Latin, not Malayalam• Information interchange — ⟨document + font⟩• Sorting (അകാരാദി മം)• Searching (information retrieval)

    16

  • Data outlive software

    Source — Martin Malmsten, National Library of Sweden

    17

  • Unicode*

    • Unique code points for (almost) every writing system(‘script’/ലിപി) in the world

    • Text/Data (പാഠം) always represents one particular script(Devanagari: Hindi/Sanskrit)

    • Only basic characters are encoded, not conjuncts (usually)• Standard agreed and supported by many operatingsystems and application softwares

    • Preferred encoding for data interchange, Web, Govtdocuments…

    *www.unicode.org18

    www.unicode.org

  • Unicode*

    • Unique code points for (almost) every writing system(‘script’/ലിപി) in the world

    • Text/Data (പാഠം) always represents one particular script(Devanagari: Hindi/Sanskrit)

    • Only basic characters are encoded, not conjuncts (usually)• Standard agreed and supported by many operatingsystems and application softwares

    • Preferred encoding for data interchange, Web, Govtdocuments…

    *www.unicode.org18

    www.unicode.org

  • Unicode*

    • Unique code points for (almost) every writing system(‘script’/ലിപി) in the world

    • Text/Data (പാഠം) always represents one particular script(Devanagari: Hindi/Sanskrit)

    • Only basic characters are encoded, not conjuncts (usually)

    • Standard agreed and supported by many operatingsystems and application softwares

    • Preferred encoding for data interchange, Web, Govtdocuments…

    *www.unicode.org18

    www.unicode.org

  • Unicode*

    • Unique code points for (almost) every writing system(‘script’/ലിപി) in the world

    • Text/Data (പാഠം) always represents one particular script(Devanagari: Hindi/Sanskrit)

    • Only basic characters are encoded, not conjuncts (usually)• Standard agreed and supported by many operatingsystems and application softwares

    • Preferred encoding for data interchange, Web, Govtdocuments…

    *www.unicode.org18

    www.unicode.org

  • Unicode*

    • Unique code points for (almost) every writing system(‘script’/ലിപി) in the world

    • Text/Data (പാഠം) always represents one particular script(Devanagari: Hindi/Sanskrit)

    • Only basic characters are encoded, not conjuncts (usually)• Standard agreed and supported by many operatingsystems and application softwares

    • Preferred encoding for data interchange, Web, Govtdocuments…

    *www.unicode.org18

    www.unicode.org

  • Unicode Malayalam†

    †www.unicode.org/charts/PDF/U0D00.pdf19

    www.unicode.org/charts/PDF/U0D00.pdf

  • Unicode Malayalam

    • ‘ക’→ 0D05

    • ‘െകാ’→ ക + െ◌ാ→ 0D05 0D4A• ‘ ’→ ക + ◌് + ക→ 0D05 0D4D 0D05• Use any Unicode Malayalam font to display data

    or

    What you see is what you have

    20

  • Unicode Malayalam

    • ‘ക’→ 0D05• ‘െകാ’→ ക + െ◌ാ→ 0D05 0D4A

    • ‘ ’→ ക + ◌് + ക→ 0D05 0D4D 0D05• Use any Unicode Malayalam font to display data

    or

    What you see is what you have

    20

  • Unicode Malayalam

    • ‘ക’→ 0D05• ‘െകാ’→ ക + െ◌ാ→ 0D05 0D4A• ‘ ’→ ക + ◌് + ക→ 0D05 0D4D 0D05

    • Use any Unicode Malayalam font to display data

    or

    What you see is what you have

    20

  • Unicode Malayalam

    • ‘ക’→ 0D05• ‘െകാ’→ ക + െ◌ാ→ 0D05 0D4A• ‘ ’→ ക + ◌് + ക→ 0D05 0D4D 0D05• Use any Unicode Malayalam font to display data

    or

    What you see is what you have

    20

  • Unicode Malayalam

    • ‘ക’→ 0D05• ‘െകാ’→ ക + െ◌ാ→ 0D05 0D4A• ‘ ’→ ക + ◌് + ക→ 0D05 0D4D 0D05• Use any Unicode Malayalam font to display data

    or

    What you see is what you have

    20

  • Unicode Malayalam

    • ‘ക’→ 0D05• ‘െകാ’→ ക + െ◌ാ→ 0D05 0D4A• ‘ ’→ ക + ◌് + ക→ 0D05 0D4D 0D05• Use any Unicode Malayalam font to display data

    or

    What you see is what you have

    20

  • Unicode Malayalam

    t e x t ← font→↑ ↑ ↑ ↑

    t e x t ← Unicode→

    ↑ ↑ ↑ ↑

    74 65 78 74 ← data→

    ക ക↑ ↑ ↑ ↑ ↑ ↑

    ക ത ◌് ത ◌ു ക

    ↑ ↑ ↑ ↑ ↑ ↑

    0D15 0D24 0D4D 0D24 0D41 0D15

    21

  • Unicode Malayalam

    Change the font, and…

    t e x t ← font→↑ ↑ ↑ ↑

    t e x t ← Unicode→

    ↑ ↑ ↑ ↑

    74 65 78 74 ← data→

    ക ു ക↑ ↑ ↑ ↑ ↑ ↑

    ക ത ◌് ത ◌ു ക

    ↑ ↑ ↑ ↑ ↑ ↑

    0D15 0D24 0D4D 0D24 0D41 0D15

    22

  • Data entry — Inscript

    k m s j ←− keyboard

    ↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    23

  • Data entry — Inscript

    k m s j ←− keyboard↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    23

  • Data entry — Inscript

    k m s j ←− keyboard↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    23

  • Data entry — Inscript

    k m s j ←− keyboard↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    23

  • Data entry — Transliteration

    ka s E ra ←− keyboard

    ↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    24

  • Data entry — Transliteration

    ka s E ra ←− keyboard↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    24

  • Data entry — Transliteration

    ka s E ra ←− keyboard↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    24

  • Data entry — Transliteration

    ka s E ra ←− keyboard↓ ↓ ↓ ↓ ←− input method

    ക സ േ◌ ര ←− Unicode

    ↓ ↓ ↓ ↓

    0D15 0D38 0D47 0D30 ←− data

    24

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ് I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ് I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ് I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ് I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ് I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ്

    I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ് I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • Pitfalls

    Some common mistakes in typing/data entry

    or

    The great ISM/ തിയലിപി hangover

    ASCII Unicode Shaping

    ക േ◌ സ ര ക സ േ◌ ര കേസര

    ◌ ഗ ◌ീ സ ◌് ഗ ◌് ര ◌ീ സ ◌് ീസ് I'm looking at you, Manorama!

    ക ◌ു ക ക ത ◌് ത ◌ു ക ക ക

    25

  • The ISM/ തിയലിപി hangover

    26

  • Data outlive software

    • Data/പാഠം must be stored for future

    • Decouple data from software/formatting• Data must be searchable• Corollary: Data must be archivable• The one thing worse than ASCII documents — scanneddocuments

    • TEI — Text Encoding Initiative• Sayahna Foundation

    27

  • Data outlive software

    • Data/പാഠം must be stored for future• Decouple data from software/formatting

    • Data must be searchable• Corollary: Data must be archivable• The one thing worse than ASCII documents — scanneddocuments

    • TEI — Text Encoding Initiative• Sayahna Foundation

    27

  • Data outlive software

    • Data/പാഠം must be stored for future• Decouple data from software/formatting• Data must be searchable

    • Corollary: Data must be archivable• The one thing worse than ASCII documents — scanneddocuments

    • TEI — Text Encoding Initiative• Sayahna Foundation

    27

  • Data outlive software

    • Data/പാഠം must be stored for future• Decouple data from software/formatting• Data must be searchable• Corollary: Data must be archivable

    • The one thing worse than ASCII documents — scanneddocuments

    • TEI — Text Encoding Initiative• Sayahna Foundation

    27

  • Data outlive software

    • Data/പാഠം must be stored for future• Decouple data from software/formatting• Data must be searchable• Corollary: Data must be archivable• The one thing worse than ASCII documents

    — scanneddocuments

    • TEI — Text Encoding Initiative• Sayahna Foundation

    27

  • Data outlive software

    • Data/പാഠം must be stored for future• Decouple data from software/formatting• Data must be searchable• Corollary: Data must be archivable• The one thing worse than ASCII documents — scanneddocuments

    • TEI — Text Encoding Initiative• Sayahna Foundation

    27

  • Data outlive software

    • Data/പാഠം must be stored for future• Decouple data from software/formatting• Data must be searchable• Corollary: Data must be archivable• The one thing worse than ASCII documents — scanneddocuments

    • TEI — Text Encoding Initiative

    • Sayahna Foundation

    27

  • Data outlive software

    • Data/പാഠം must be stored for future• Decouple data from software/formatting• Data must be searchable• Corollary: Data must be archivable• The one thing worse than ASCII documents — scanneddocuments

    • TEI — Text Encoding Initiative• Sayahna Foundation

    27

  • Text shaping

  • Complex text shaping

    Unicode solves one part of the problem (data).Complex scripts, unlike Latin, change the shape and order ofglyphs.

    G N U −→ GNUf f i −→ ffiഗ ◌് ന ◌ു −→ക സ േ◌ ര −→ കേസര

    28

  • Complex text shaping

    Unicode solves one part of the problem (data).Complex scripts, unlike Latin, change the shape and order ofglyphs.

    G N U −→ GNU

    f f i −→ ffiഗ ◌് ന ◌ു −→ക സ േ◌ ര −→ കേസര

    28

  • Complex text shaping

    Unicode solves one part of the problem (data).Complex scripts, unlike Latin, change the shape and order ofglyphs.

    G N U −→ GNUf f i −→ ffi

    ഗ ◌് ന ◌ു −→ക സ േ◌ ര −→ കേസര

    28

  • Complex text shaping

    Unicode solves one part of the problem (data).Complex scripts, unlike Latin, change the shape and order ofglyphs.

    G N U −→ GNUf f i −→ ffiഗ ◌് ന ◌ു −→

    ക സ േ◌ ര −→ കേസര

    28

  • Complex text shaping

    Unicode solves one part of the problem (data).Complex scripts, unlike Latin, change the shape and order ofglyphs.

    G N U −→ GNUf f i −→ ffiഗ ◌് ന ◌ു −→

    ക സ േ◌ ര −→ കേസര

    28

  • Complex text shaping

    Unicode solves one part of the problem (data).Complex scripts, unlike Latin, change the shape and order ofglyphs.

    G N U −→ GNUf f i −→ ffiഗ ◌് ന ◌ു −→ക സ േ◌ ര −→

    കേസര

    28

  • Complex text shaping

    Unicode solves one part of the problem (data).Complex scripts, unlike Latin, change the shape and order ofglyphs.

    G N U −→ GNUf f i −→ ffiഗ ◌് ന ◌ു −→ക സ േ◌ ര −→ കേസര

    28

  • What is ‘text shaping’?

    29

  • Unicode Malayalam font glyphs — Rachana

    30

  • Complex text shaping

    • Font (glyphs) + ‘OpenType’ shaping rules

    • Operating System support required for proper shaping• ക + ◌് + ക→ (shaping rules)• OpenType specification‡ — followed by GNU/Linux,Windows applications; Apple uses ‘AAT’

    • HarfBuzz§ shaping engine (libre software) used byGNU/Linux, Qt, GTK, Android, Scribus, XƎTEX, LibreOffice...

    • Adobe use their own shaping engine, has bugs/issueswith shaping. Even they are going to use HarfBuzz!

    ‡docs.microsoft.com/en-us/typography/opentype/spec/§www.freedesktop.org/wiki/Software/HarfBuzz/

    31

    docs.microsoft.com/en-us/typography/opentype/spec/www.freedesktop.org/wiki/Software/HarfBuzz/

  • Complex text shaping

    • Font (glyphs) + ‘OpenType’ shaping rules• Operating System support required for proper shaping

    • ക + ◌് + ക→ (shaping rules)• OpenType specification‡ — followed by GNU/Linux,Windows applications; Apple uses ‘AAT’

    • HarfBuzz§ shaping engine (libre software) used byGNU/Linux, Qt, GTK, Android, Scribus, XƎTEX, LibreOffice...

    • Adobe use their own shaping engine, has bugs/issueswith shaping. Even they are going to use HarfBuzz!

    ‡docs.microsoft.com/en-us/typography/opentype/spec/§www.freedesktop.org/wiki/Software/HarfBuzz/

    31

    docs.microsoft.com/en-us/typography/opentype/spec/www.freedesktop.org/wiki/Software/HarfBuzz/

  • Complex text shaping

    • Font (glyphs) + ‘OpenType’ shaping rules• Operating System support required for proper shaping• ക + ◌് + ക→ (shaping rules)

    • OpenType specification‡ — followed by GNU/Linux,Windows applications; Apple uses ‘AAT’

    • HarfBuzz§ shaping engine (libre software) used byGNU/Linux, Qt, GTK, Android, Scribus, XƎTEX, LibreOffice...

    • Adobe use their own shaping engine, has bugs/issueswith shaping. Even they are going to use HarfBuzz!

    ‡docs.microsoft.com/en-us/typography/opentype/spec/§www.freedesktop.org/wiki/Software/HarfBuzz/

    31

    docs.microsoft.com/en-us/typography/opentype/spec/www.freedesktop.org/wiki/Software/HarfBuzz/

  • Complex text shaping

    • Font (glyphs) + ‘OpenType’ shaping rules• Operating System support required for proper shaping• ക + ◌് + ക→ (shaping rules)• OpenType specification‡ — followed by GNU/Linux,Windows applications; Apple uses ‘AAT’

    • HarfBuzz§ shaping engine (libre software) used byGNU/Linux, Qt, GTK, Android, Scribus, XƎTEX, LibreOffice...

    • Adobe use their own shaping engine, has bugs/issueswith shaping. Even they are going to use HarfBuzz!

    ‡docs.microsoft.com/en-us/typography/opentype/spec/§www.freedesktop.org/wiki/Software/HarfBuzz/

    31

    docs.microsoft.com/en-us/typography/opentype/spec/www.freedesktop.org/wiki/Software/HarfBuzz/

  • Complex text shaping

    • Font (glyphs) + ‘OpenType’ shaping rules• Operating System support required for proper shaping• ക + ◌് + ക→ (shaping rules)• OpenType specification‡ — followed by GNU/Linux,Windows applications; Apple uses ‘AAT’

    • HarfBuzz§ shaping engine (libre software) used byGNU/Linux, Qt, GTK, Android, Scribus, XƎTEX, LibreOffice...

    • Adobe use their own shaping engine, has bugs/issueswith shaping. Even they are going to use HarfBuzz!

    ‡docs.microsoft.com/en-us/typography/opentype/spec/§www.freedesktop.org/wiki/Software/HarfBuzz/

    31

    docs.microsoft.com/en-us/typography/opentype/spec/www.freedesktop.org/wiki/Software/HarfBuzz/

  • Complex text shaping

    • Font (glyphs) + ‘OpenType’ shaping rules• Operating System support required for proper shaping• ക + ◌് + ക→ (shaping rules)• OpenType specification‡ — followed by GNU/Linux,Windows applications; Apple uses ‘AAT’

    • HarfBuzz§ shaping engine (libre software) used byGNU/Linux, Qt, GTK, Android, Scribus, XƎTEX, LibreOffice...

    • Adobe use their own shaping engine, has bugs/issueswith shaping.

    Even they are going to use HarfBuzz!

    ‡docs.microsoft.com/en-us/typography/opentype/spec/§www.freedesktop.org/wiki/Software/HarfBuzz/

    31

    docs.microsoft.com/en-us/typography/opentype/spec/www.freedesktop.org/wiki/Software/HarfBuzz/

  • Complex text shaping

    • Font (glyphs) + ‘OpenType’ shaping rules• Operating System support required for proper shaping• ക + ◌് + ക→ (shaping rules)• OpenType specification‡ — followed by GNU/Linux,Windows applications; Apple uses ‘AAT’

    • HarfBuzz§ shaping engine (libre software) used byGNU/Linux, Qt, GTK, Android, Scribus, XƎTEX, LibreOffice...

    • Adobe use their own shaping engine, has bugs/issueswith shaping. Even they are going to use HarfBuzz!

    ‡docs.microsoft.com/en-us/typography/opentype/spec/§www.freedesktop.org/wiki/Software/HarfBuzz/

    31

    docs.microsoft.com/en-us/typography/opentype/spec/www.freedesktop.org/wiki/Software/HarfBuzz/

  • Complex text shaping engine

    $ hb-shape -v Rachana-Regular.ttf "െകാ"

    1: (െകാ)1: 1: [e1|k1@1112,0|a2@2700,0]

    e1→ െ◌ k1→ ക a2→ ◌ാ

    32

  • Complex text shaping engine

    $ hb-shape -v Rachana-Regular.ttf "െകാ"1: (െകാ)

    1: 1: [e1|k1@1112,0|a2@2700,0]

    e1→ െ◌ k1→ ക a2→ ◌ാ

    32

  • Complex text shaping engine

    $ hb-shape -v Rachana-Regular.ttf "െകാ"1: (െകാ)1:

    1: [e1|k1@1112,0|a2@2700,0]

    e1→ െ◌ k1→ ക a2→ ◌ാ

    32

  • Complex text shaping engine

    $ hb-shape -v Rachana-Regular.ttf "െകാ"1: (െകാ)1: 1: [e1|k1@1112,0|a2@2700,0]

    e1→ െ◌ k1→ ക a2→ ◌ാ

    32

  • Complex text shaping engine

    $ hb-shape -v Rachana-Regular.ttf "െകാ"1: (െകാ)1: 1: [e1|k1@1112,0|a2@2700,0]

    e1→ െ◌ k1→ ക a2→ ◌ാ

    32

  • The lookup rules state machine

    pref pre-base form ◌് + ര→ ◌

    pstf post-base form ◌് + വ→ ◌ , ◌് + യ→ ◌

    blwf below-base form ◌് + ല→ ◌

    akhn akhant conjuncts ക + ◌് + ക→

    pres pre-base substitution ◌ + പ→

    psts post-base substitution + ◌ു→

    blws below-base substitution പ + ◌→

    33

  • Kerning — TN Joy

    34

  • Complex text shaping — OpenType lookup rules

    35

  • Complex text shaping

    • Font = Art + Engineering

    • Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules

    (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion],

    GPOS [kerning, mark positioning] etc.)• Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support

    — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x…

    v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s

    Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Complex text shaping

    • Font = Art + Engineering• Design — Glyphs, Ascender, Descender, Character spacing,Word spacing etc.

    • Programming — OpenType shaping rules (GSUB [glyphsubstituion], GPOS [kerning, mark positioning] etc.)

    • Shaping engine support — Windows xp, Pango, Qt4,LibreOffice ≤ 5.x… v/s Windows Vista+ (Uniscribe),HarfBuzz

    36

  • Shaping issues

    • Perfect shaping of all conjuncts may not always work asexpected

    • Report bugs (and respect the License)

    rachana.org.insmc.org.in

    37

    rachana.org.insmc.org.in

  • Shaping issues

    • Perfect shaping of all conjuncts may not always work asexpected

    • Report bugs (and respect the License)

    rachana.org.insmc.org.in

    37

    rachana.org.insmc.org.in

  • Shaping issues

    • Perfect shaping of all conjuncts may not always work asexpected

    • Report bugs (and respect the License)

    rachana.org.insmc.org.in

    37

    rachana.org.insmc.org.in

  • Shaping issues

    • Perfect shaping of all conjuncts may not always work asexpected

    • Report bugs (and respect the License)

    rachana.org.insmc.org.in

    37

    rachana.org.insmc.org.in

  • Shaping issues

    • Perfect shaping of all conjuncts may not always work asexpected

    • Report bugs (and respect the License)

    rachana.org.insmc.org.in

    37

    rachana.org.insmc.org.in

  • Shaping issues

    • Perfect shaping of all conjuncts may not always work asexpected

    • Report bugs (and respect the License)

    rachana.org.insmc.org.in

    37

    rachana.org.insmc.org.in

  • Shaping issues

    • Perfect shaping of all conjuncts may not always work asexpected

    • Report bugs (and respect the License)

    rachana.org.insmc.org.in

    37

    rachana.org.insmc.org.in

  • Update fonts

    • And it gets fixed

    • Update fonts• GNU/Linux — via package update• Windows, macOS — uninstall existing version, download &install new version

    • Android — use Magisk

    38

  • Update fonts

    • And it gets fixed

    • Update fonts• GNU/Linux — via package update• Windows, macOS — uninstall existing version, download &install new version

    • Android — use Magisk

    38

  • Update fonts

    • And it gets fixed

    • Update fonts• GNU/Linux — via package update• Windows, macOS — uninstall existing version, download &install new version

    • Android — use Magisk

    38

  • Update fonts

    • And it gets fixed

    • Update fonts• GNU/Linux — via package update• Windows, macOS — uninstall existing version, download &install new version

    • Android — use Magisk

    38

  • Questions?

    ന ി.

    39

  • Questions?

    ന ി.

    39

    Data v/s PresentationMalayalamText shaping