Fix leading space in surnames after capitalize() with empty middle name#164
Conversation
capitalize() split each attribute with str.split(' '), which returns ['']
(not []) for an empty string. cap_piece() returns '' for an empty part, so an
empty middle name produced middle_list = [''], which leaked into surnames_list
(middle_list + last_list) and yielded a leading space in the surnames property:
>>> hn = HumanName('john doe'); hn.capitalize(); hn.surnames
' Doe' # leading space (should be 'Doe')
The same spurious '' element also appeared in title_list/first_list/last_list
for empty attributes. Using str.split() instead returns [] for empty strings
and is equivalent for the already-whitespace-collapsed pieces cap_piece()
returns. The suffix split (', ') is intentionally left unchanged.
Added a regression test in HumanNameCapitalizationTestCase.
50b95d5 to
7fbb73c
Compare
|
Rebased onto master (1f46568) to clear a merge conflict from the test-suite reorg in #165. The monolithic tests.py is gone, so I moved the regression test into tests/test_capitalization.py next to the other capitalize cases. The parser.py fix itself is unchanged (the four |
- Split test_capitalize_empty_name_part_has_no_leading_space_in_surnames
into three focused tests (normal path, force=True path, list invariants)
so failures self-localize to the scenario
- Added test_capitalize_title_and_last_only_no_spurious_tokens covering
a name with empty first and middle simultaneously
- Sharpened inline comments to accurately describe root cause and why
force=True matters (early-return guard)
- Added comment to suffix_list line explaining the intentional split(', ')
asymmetry vs the space-delimited attributes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
''.split(', ') returns [''] just like ''.split(' ') did for the other
attributes. Use a filtered list comprehension to preserve the comma
delimiter while dropping empty tokens, making suffix_list consistent
with the [] invariant the rest of the codebase relies on.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Additional fix included:
|
Summary
After
capitalize()is called on a name with any absent attribute, the corresponding*_listattribute becomes['']instead of[]. The most visible symptom is a spurious leading space insurnames(e.g.' Doe'instead of'Doe'). The same class of bug also affectssuffix_list.Reproduction
The leading space is a real observable artifact, not just an empty-list detail: downstream code that joins or compares surnames silently breaks.
Cause
cap_piece(...)returns''for an absent name part. Splitting an empty string with an explicit separator always preserves a spurious empty token:''.split(' ')→['']''.split(', ')→['']whereas
str.split()with no argument returns[]for an empty string.Fix
Whitespace-delimited attributes (
title,first,middle,last)Drop the explicit separator — bare
.split()correctly returns[]for an empty string:Suffix
suffix_listmust keep its comma delimiter (suffixes are stored as"Ph.D., M.D."), so bare.split()can't be used. Instead, filter empty tokens after splitting:Tests
force=Truepath, list invariants)test_capitalize_title_and_last_only_no_spurious_tokens— exercises emptyfirst_listandmiddle_listsimultaneously[], single suffix, multiple suffixesFull suite: 686 tests OK (20 expected failures).