Skip to content

[Rule] ClosestString to ILP #1034

@zhangjieqingithub

Description

@zhangjieqingithub

Source

ClosestString

Source model issue: #1032

Target

ILP/i32

Motivation

This rule gives ClosestString immediate solver connectivity through the existing integer-linear-programming hub.

The formulation is compact: choose one alphabet symbol at each center position and minimize a radius variable that upper-bounds the Hamming distance to every input string.

Reference

Ming Li, Bin Ma, and Lusheng Wang, "On the closest string and substring problems," Journal of the ACM 49(2):157-171, 2002. https://doi.org/10.1145/506147.506150

Reduction Algorithm

Let the source instance have:

  • alphabet size q,
  • n input strings s_1, ..., s_n,
  • common string length m.

Construct an ILP/i32 instance as follows.

  1. For every center position j in {0,...,m-1} and alphabet symbol a in {0,...,q-1}, create an integer variable x_{j,a}.
    It will be forced to be binary and means: the center has symbol a at position j.

  2. Create one nonnegative integer radius variable R.

  3. For each position j, add the assignment constraint:
    sum_a x_{j,a} = 1.
    Since all ILP variables are nonnegative integers, this makes the x_{j,a} variables binary.

  4. For each input string s_i, add the radius constraint:
    R + sum_{j=0}^{m-1} x_{j, s_i[j]} >= m.

    This is equivalent to:
    R >= m - sum_j x_{j, s_i[j]},
    and the right-hand side is exactly the Hamming distance from the chosen center to s_i.

  5. Minimize R.

  6. Recover the center string by, for each position j, choosing the unique symbol a with x_{j,a} = 1.

Correctness Argument

The assignment constraints ensure that every feasible ILP solution encodes exactly one center string c in Sigma^m.

For a fixed input string s_i, the quantity sum_j x_{j, s_i[j]} counts the number of positions where c matches s_i. Therefore m - sum_j x_{j, s_i[j]} is exactly d_H(c, s_i).

The constraint R + sum_j x_{j, s_i[j]} >= m forces R >= d_H(c, s_i) for every input string. Minimizing R therefore minimizes the maximum Hamming distance from the center to all input strings.

Thus optimal ILP solutions are in one-to-one correspondence with optimal closest strings, ignoring the auxiliary radius variable.

Size Overhead

Let:

  • q = alphabet_size,
  • m = string_length,
  • n = num_strings.
Target metric Formula
num_vars q * m + 1
num_constraints m + n

Validation Method

Use closed-loop validation on small instances:

  • enumerate all center strings directly and compute the optimum radius,
  • solve the ILP instance,
  • extract the center string from the x variables,
  • verify that the extracted center has the same optimum radius.

Include examples where:

  • several centers tie for optimum,
  • the optimum radius is nonzero,
  • no input string itself is uniquely optimal,
  • the alphabet has more than two symbols.

Example

Use the source instance from #1032:

  • alphabet {0,1},
  • strings 000, 011, 101, 110.

Here q = 2, m = 3, n = 4.

The ILP has:

  • 2 * 3 = 6 center-symbol variables,
  • one radius variable R,
  • 3 assignment constraints,
  • 4 radius constraints.

The center 000 is represented by:

  • x_{0,0} = 1, x_{0,1} = 0,
  • x_{1,0} = 1, x_{1,1} = 0,
  • x_{2,0} = 1, x_{2,1} = 0.

For input string 011, the radius constraint is:
R + x_{0,0} + x_{1,1} + x_{2,1} >= 3.

Substituting the center 000 gives:
R + 1 + 0 + 0 >= 3,
so R >= 2.

The other nonzero-distance strings similarly force R >= 2, and setting R = 2 is feasible. Therefore the ILP optimum is 2, matching the ClosestString optimum.

BibTeX

@article{LiMaWang2002ClosestStringSubstring,
  author  = {Ming Li and Bin Ma and Lusheng Wang},
  title   = {On the Closest String and Substring Problems},
  journal = {Journal of the ACM},
  volume  = {49},
  number  = {2},
  pages   = {157--171},
  year    = {2002},
  doi     = {10.1145/506147.506150},
  url     = {https://doi.org/10.1145/506147.506150}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    ruleA new reduction rule to be added.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions