Handle non-ASCII charater filenames in EnrollWithRequestIdBean#175
Handle non-ASCII charater filenames in EnrollWithRequestIdBean#175SirTediousOfFoo wants to merge 10000 commits intoKeyfactor:mainfrom
Conversation
Resolve ECA-10481 "Fb configdump protocol configs" Closes ECA-10481 See merge request ejbca/ejbca!402
ECA-10793: Fix transaction handling in AdminPreferenceSessionBean Closes ECA-10793 See merge request ejbca/ejbca!405
…-alias' into 'main' ECA-10443 Added DNS identifier challenge type selection to ACME Closes ECA-10443 See merge request ejbca/ejbca!406
ECA-10623 Allow quotation marks in the CDP Closes ECA-10623 See merge request ejbca/ejbca!380
ECA-10811: Modified the CertificateCrlReader so that status gets updated Closes ECA-10811 See merge request ejbca/ejbca!385
ECA-10853: Handle failure to load PKCS#11 library, to avoid partial lock-out from GUI Closes ECA-10853 See merge request ejbca/ejbca!408
…'main' ECA-10801: Modified IKB UI page to not display crypto tokens without Closes ECA-10801 See merge request ejbca/ejbca!394
Resolve ECA-10730 "Fb acme mode md issues" Closes ECA-10730 See merge request ejbca/ejbca!409
…b End Entity page
Conflicts: build.xml modules/build-properties.xml modules/build.xml modules/ejbca-rest-configdump/src/org/ejbca/ui/web/rest/api/resource/ConfigdumpRestResource.java
…eparate-edition Conflicts: build.xml
…oval-pages' into 'main' Resolve ECA-10786 "Fb extra fields in ra web end entity and approval pages" Closes ECA-10786 See merge request ejbca/ejbca!412
…'fb-ECA-10843-wsracli-stress-param-no-tests' ECA-10843: ctb EjbcaWsRaCli stress: Make available to specify the number of tests to be run See merge request ejbca/ejbca!403
For easier element selection in DOM.
… title for 1st column)
…ypo) Noticket: English language files fixed (MS Auto-Enrollment: error of button) Noticket: English language files fixed (OAuth Key Management: typo, and spaces, for l10n people) Noticket: English language files fixed (ACME protocol: spaces, typo, format titles) Noticket: Dummy zz language file fixed (remove Id, UNIX End of Line) Noticket: English language files fixed (ACME protocol: minor typo) Noticket: English language files fixed (duplicated message, minor fixes) Noticket: English language files updated (CA Structure & CRLs: better title for 1st column)
…o l10n-en-english-main-7.10-2
…anslated) Noticket: French language files updated (OAuth Key Management: fully translated) Noticket: French language files updated (OAuth Key Management: fixes) Noticket: French language files updated (ACME protocol: spaces) Noticket: French language files updated (ACME protocol: fully translated) Noticket: French language files updated (Azure CRL Publisher) Noticket: French language files updated (Intune Certificate Revocation) Noticket: French language files updated (C-ITS certificates) Noticket: French language files updated (CA Structure & CRLs: 'Download' messages)
… l10n-fr-french-main-7.10-2
….10-2 L10n: English fixes for the main branch (based on 7.10.0.2)
…10-2 L10n: French updates for the main branch (based on 7.10.0.2) Fully translated
| if(fileName == null){ | ||
| fileName = "certificatetoken"; | ||
| } | ||
| final String fileName = getFileName(); |
There was a problem hiding this comment.
Removed filename handling logic outside of the downloadToken function
| * | ||
| * @return the file name to use in the content disposition header, filename safe characters | ||
| */ | ||
| private String getFileName() { |
There was a problem hiding this comment.
Added finemane handling function like the one in EnrollMakeNewRequestBean
|
I'm not too fond of base64 as filename, it's very user non-friendly. Isn't there an apache string function somewhere that makes strings "filename" friendly? |
|
Ah I see, you just copied the behavior that was already part of another piece in EJBCA.... |
|
Base64 encoding as the last resort is definitely a good way. One potential improvement, that could preserve some utf-8 characters would be to base64 encode it if it doesn't pass: |
|
Uh yeah, It's unwieldy and I didn't really like it either but seeing as it was already there for a different enrollment option I went with it as the obvious solution. I'll check out your suggestion and get back to you here, I'm looking at what else could be done |
|
A nice solution could be replacing UTF-8 characters with corresponding ASCII ones - https://stackoverflow.com/a/4122207/19848036 |
|
About the time you posted this I found https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#stripAccents-java.lang.String- which eluded me at first, I'll test it out and update the PR although I wonder if the solution you posted would cover a wider range of edge cases |
|
Alright, I've switched up the filename logic so now it uses the Apache commons lang3 StringUtils which implements a stripAccents function that does exactly what I wanted to achieve here. I tested out this change with the names If nothing else this'll remedy some issues for users with less-than-standard alphabets |
| return StringTools.stripFilename(commonName); | ||
| } | ||
| return Base64.encodeBase64String(commonName.getBytes()); | ||
| return org.apache.commons.lang3.StringUtils.stripAccents(commonName); |
There was a problem hiding this comment.
The StringUtils from org.apache.commons.lang doesn't have a stripAccents method so in order to keep everything else in the code as it was I just went with using the lang3 StringUtils like this
|
Unfortunately, @Test
public void testTextNormalisation() {
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Test CA")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("malmö.se")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Га́рри Ки́мович Каспа́ров")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Mǎ Yún")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("马云")));
}This will cause problems for anyone using Chinese or Russian characters in their CN. I don't know how common that is, but there are a couple of people in China using EJBCA. The standards don't really say what to do, so non-ASCII filenames are a bit of a mess, but my understanding is that most (all?) modern web browsers supports UTF-8 character in the final String encodedFilename = URLEncoder.encode(filename, StandardCharsets.UTF_8);
ec.setResponseHeader("Content-Disposition", "attachment; filename*=UTF-8''" + encodedFilename);As a sidenote, I asked ChatGPT and it seems to agree, and provides some more details:
import java.net.URLEncoder;
String filename = "my file with åäö characters.pdf";
String encodedFilename = URLEncoder.encode(filename, "UTF-8"); String contentDisposition = "attachment; filename*=UTF-8''" +
String contentDisposition = "attachment; filename*=UTF-8''" + encodedFilename;
System.out.println(contentDisposition);
To keep backwards compatibility with older browsers, one could look at the |
6a106af to
ce7f5e7
Compare
647a632 to
e3961c2
Compare
We had issues with issuing certs containing Croatian symbols - žđšćč as mentioned in #174 so I've looked into the parts that handle it and found slightly differing logic in the
downloadTokenfunction between theEnrollWithRequestIdBeanandEnrollMakeNewRequestBean.javawhere one uses agetFileName()function to handle the name generation and the other one does it inline.The one using the function also encodes the filename to a Base64 string which bypasses the issues with non-ASCII characters.