Problems & Puzzles: Puzzles

Puzzle 896. Optimal packing of a set of integers

Here we ask for an algorithm that produces an optimal answer. The original problem goes back up to 1979 (See Ref 3). But in this puzzle I use a numerical version of it.

Problem. "Given a set of decimal integers, what is the shortest string S of L decimal digits that contains all the integers as subsequences in the asked string."�

Example:

Set of integers: {123, 451, 46733}
Optimal string:� S = 46751233, L = 8.
Usually there are several distinct strings S with the same L minimal value.

Regarding the asked algorithm remember that we ask here for the optimal solution, that is to say, the minimal L for the S asked solution.

There are some other very simple algorithm & solutions, but these are not the optimal asked ones:

a) The worst solution LWS, is just to concatenate all the integers of the set. For this solution L is the sum of the lengths of all the integers.

b) A better solution LBS is to sum the counts of the distinct digits in each position of the integers. In this case, L is the sum of all the counts.

But none of these algorithms produce the optimal solution LOS, as you soon will discover.

Going back to the example shown above, LWS = 3+3+5 = 11, LBS = 2+3+3+1+1 = 10 but LOS = 8.

OK. Hands to work! Let's define three sets:

Set A: {4242064951, 5519385407, 6886477529, 3816749731, 4700006089, 9650560979, 4371455527, 5419320281, 8761823531, 3454277693}

Set B: {Prime numbers from 2 to 97}

Set C: {Prime numbers from 2 to 997}

Q1. Develop and send your algorithm description.

Q2. Send your best string S solutions for Sets A, B & C, respectively, as produced by your algorithm. [Starting point: my best solutions are L=45, 11, 21 for sets A, B & C, respectively]

References:

  1. PLANET PACKING by A. ROSS ECKLER (2001)
  2. PLANET PACKING REVISITED by DANA RICHARDS (2001)
  3. Computers and Intractability: A Guide to the Theory of NP-Completeness, by�Michael Garey�and�David S. Johnson. Appendix A4.2, SR8, Shortest Common Super sequence.


On Feb. 8, 2018 Dmitry Kamenetsky wrote:

What an interesting puzzle, I had some great time with it!

For Set A, I found a solution with 36 digits:

435874506198230864527700564095278391

For primes < 100 (Set B), 11 digits is the best we can do.

For primes < 1000 (Set C), 21 digits is the best we can do. I did find an almost-solution with 20 digits, but it is missing 661 (need an extra 6): 81427359628750419371

I then looked for primes < 10,000 and found a solution with 30 digits:

425813679450328167906458231793

My friend (Tomek Czajka) showed that this is optimal:

"You need at least 2 zeroes and 3 of every other digit, because there are primes containing that many. So that's at least 29. Suppose you use exactly that many.
7333, 3733, 3373 are all prime, so digits 737373 have to be in that order.
3331 is a prime, so there is a 1 after the last 3.
1117 is a prime, so all three 1s are before the last 7.
Contradiction!"

I then looked for primes < 100,000 and found a solution with 40 digits:

2145879637814520936740281539610872549371. I don't know if it is optimal.

...

Here is a brief description of my algorithm.

The key to this problem was finding a good method to evaluate partial solutions. Suppose we have built a string S and we also have a list of strings P (such as primes below 1000) that we want to be subsequences of S. I used the following scoring function:
score=0
for p in P:
� lcs=longestCommonSubsequence(S,p)
� score+=(p.length()-lcs.length())
return score
longestCommonSubsequence(a,b) computes the longest common subsequence between a and b.�If the length of lcs is the same as the length of p then p is a subsequence of S. Otherwise we want the lengths to be as close as possible. A solution is obtained when the final score is 0. For this function�I used the algorithm from Wikipedia:
Then I used a hill-climbing approach to find the best solution. It looks something like this:
Initialise S randomly
while (true):
� mutate(S)� � #change some digits of S
� optimize(S)� �#try making all possible pairwise digit swaps, as well as, changing each digit. Only keep changes that improve the score
� if the score of S has improved then we keep it and print it, otherwise we reject all changes

...

The algorithm part can be improved with this:
Initialise S randomly
while (true):
� S'=mutate(S)� � #change some digits of S
� optimize(S')� �#try making all possible pairwise digit swaps, as well as, changing each digit. Only keep changes that improve the score
� if score(S') < score(S):
� � S=S'

***

On Feb 23, 2018 Dmitry wrote:

I've created some sequences related to this puzzle:

***


Records�� |� Conjectures� |� Problems� |� Puzzles