Skip to content

Add Tifinagh languages and tests.#292

Open
scossu wants to merge 3 commits intomainfrom
tifinagh
Open

Add Tifinagh languages and tests.#292
scossu wants to merge 3 commits intomainfrom
tifinagh

Conversation

@scossu
Copy link
Copy Markdown
Collaborator

@scossu scossu commented Apr 5, 2026

@RandyBarry see attached test results. There seems to be a problem transliterating the comma character, and maybe a couple of other characters that I can't tell right now if it's an incorrect test pair or an incorrect mapping.
tamashek.csv
tamazight_moroccan.csv
tifinagh_generic.csv

@RandyBarry
Copy link
Copy Markdown
Collaborator

RandyBarry commented Apr 5, 2026 via email

@scossu
Copy link
Copy Markdown
Collaborator Author

scossu commented Apr 5, 2026

I have updated the mappings and the tests. There are still a few outstanding issues:

tamashek.csv
tamazight_moroccan.csv
tifinagh_generic.csv

@RandyBarry
Copy link
Copy Markdown
Collaborator

What are the outstanding issues in the mappings for Tifinagh script and/or the languages that use it? I don't see the before/after comparisons.

@scossu
Copy link
Copy Markdown
Collaborator Author

scossu commented Apr 7, 2026

Sorry, I had attached your test files instead of the reports.

Here are the correct files:

test_tamashek.log
test_tamazight_moroccan.log
test_tifinagh_generic.log

As you can see, some of the issues are related to the comma + combining underscore we discussed over email.

E.g. "Imdanen, akken" transliterates "ⵉⵎⴷⴰⵏⴻⵏ, ⴰⴽⴽⴻⵏ" instead of "ⵉⵎⴷⴰⵏⴻⵏ⵰ ⴰⴽⴽⴻⵏ" because the lone comma is not mapped -- the comma followed by a combining underscore is. Conversely, "ⵉⵎⴷⴰⵏⴻⵏ⵰ ⴰⴽⴽⴻⵏ" transliterates into " Imdanen,̲ akken" instead of the expected "Imdanen, akken".

@RandyBarry
Copy link
Copy Markdown
Collaborator

Stefano: unconverted lone comma is okay. User of the tifinagh script use Latin punctuation, like the regular comma, in addition to the special tifinagh separator. A regular comma should remain a comma whereas the comma with low line converts to the tifinagh separator. We need to test both the regular comma AND the special separator. Both seem to be fine now.

@scossu
Copy link
Copy Markdown
Collaborator Author

scossu commented Apr 19, 2026

@RandyBarry Just to confirm, currently my terminal (but I think it's a common behavior) the sequence \u002C\u0332 displays a simple comma + the combining underscore, which underscores the following character, not the comma. You are referring to an underlined comma, so I wonder if you intended to use \u0332\u002C (combined underlined comma) in the transliteration table.

If the transliteration table is correct, then I have to adjust the test strings so that "ⵉⵎⴷⴰⵏⴻⵏ⵰ ⴰⴽⴽⴻⵏ" transliterates to "Imdanen, ̲akken".

@RandyBarry
Copy link
Copy Markdown
Collaborator

RandyBarry commented Apr 19, 2026 via email

@scossu
Copy link
Copy Markdown
Collaborator Author

scossu commented Apr 19, 2026

I think I found out why we haven't been aligning. The terminal I am using is not handling the combining underscore correctly, and it's displaying it one character too late:

1776634625_screenshot

Another terminal (with better Unicode support) handles it correctly:

1776634613_screenshot

So, what I was seeing in my Tifinagh tests, was a comma followed by an underscored space, or whatever followed the comma, which looked like a very strange morphological feature.

All clear now. It was just a display issue on my end. I will review the test files and merge this PR later. Thanks.

@scossu
Copy link
Copy Markdown
Collaborator Author

scossu commented Apr 19, 2026

Just a few more tests failing:

test_tamashek.log
test_tamazight_moroccan.log
test_tifinagh_generic.log

@RandyBarry
Copy link
Copy Markdown
Collaborator

RandyBarry commented Apr 20, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants