# How to generate combinatorial vectors avoiding some determinated subvectors

Asked by Lorenzo on 8 Aug 2012
Latest activity Commented on by Lorenzo on 9 Aug 2012

Hello, I have to generate in a script many combinatorial vectors, and the matter is the time: the script would take many months, probably some years to finish. The last step in the script is to generate all the (60¦9)=60!/(9!*54!) combinatorial vectors of lenght 9 (one at a time, using a vector called a and overwriting the previous vector), where the components are ordered, different and assume integer values between 1 and 60, that are

```(1,2,3,4,5,6,7,8,9)
(1,2,3,4,5,6,7,8,10)
(1,2,3,4,5,6,7,8,11)
...    ...
(1,2,3,4,5,6,7,8,60)
(1,2,3,4,5,6,7,9,10)
(1,2,3,4,5,6,7,9,11)
...    ...
(52,53,54,55,56,57,58,59,60).
```

For doing this I used the sequence of commands

```a = int8(1:9);
finish = 1;
while finish == 1
i = 9;
while i>0 & a(i) == 51+i
i = i-1;
end
if i == 0
finish = 0;
else
a(i) = a(i)+1;
while i<9
a(i+1) = a(i)+1;
i = i+1;
end
end
end
```

The problem is that the script must test each one of these vectors and delete those which contain some combinatorial vectors of lenght between 2 and 8: for example, if the script must test if the vector

a = [a(1) a(2) a(3) a(4) a(5) a(6) a(7) a(8) a(9)]

doesn't contain nor the vector of lenght 3 [4 19 47] neither the vector of lenght 5 [3 10 16 40 50], I used the command

```sum(ismember([4 19 47],nchoosek(a,3))'rows') +
sum(ismember([3 10 16 40 50],nchoosek(a,5))'rows') == 0 .
```

Generating all these vectors, and, ABOVE ALL, using many times the command "sum(ismember(...))" causes a huge loss of time.

My question is if it is possible, rather than generate all the vectors and then exclude (using "sum(ismember(..))") those which contain some subvectors found before (the script begins with vectors of lenght 2, increasing by 1 the lenght of the vectors when it has finished with all the vectors of lenght 2, and so on, arriving to vectors of lenght 9), TO GENERATE DIRECTLY ONLY THE VECTORS WHICH DO NOT CONTAIN THE GIVEN SUBVECTORS.

P. S. Someone has suggested me to write a script with Fortran90, or C++, and to parallelize it. This could be already a solution, but I'd like to know if there can be a solution ALSO BY IMPROVING THE ALGORITHM.

Thanks to those who will read this text until the end, every kind of answer will useful to me.

Lorenzo Bussoli

## Products

No products are associated with this question.

Answer by Matt Fig on 8 Aug 2012

As far as improving the efficiency of your code, I have some questions about what you are doing.

First, I do not see how your loop is generating what you say it is. When I make it display 'a' just before the last END, the first vector is:

```1    2    3    4    5    6    7    8   10
```

and not:

```1    2    3    4    5    6    7    8   9
```

Second, I assume that it is right before the last END in your loop that you are using a for whatever other purpose you have, and this is where you do your check. Is that correct?

Third, I don't understand your check. In one place you say that you want to check if 'a' has any "combinatorial vectors" but later it looks like you are looking for an exact match. For example, say

```a = [2 19 4 47 3]
```

and we want to check if 'a' contains any combinatorial of the vector

```s = [4 19 47];
```

meaning that if a has any of these subvectors:

```[4 19 47; 4 47 19; 19 4 47; 19 47 4; 47 4 19; 47 19 4]
```

then we will get a positive return. Your statement does not do this

```TF = ismember(s,nchoosek(a,3),'rows');  % TF==0
```

But then later you seem to be saying that you only want to see if 'a' contains the exact vector 's', not a permutation. If this is the case, why make it so hard (and slow) using ISMEMBER and NCHOOSEK? It is much faster to do:

```a = [2 4 19 47 3];  % New vector which contains vect 's'
TF = ismember(s,nchoosek(a,3),'rows');  % TF==1
TF2 = any(strfind(a,s));  %TF2==1
```

Once things are a little clearer, we might be able to help you speed this up even more.

## 1 Comment

Lorenzo on 9 Aug 2012

Good afternoon Matt Fig, to answer the questions I report all the part of the script at the step 9 (the last step), except the recall of a function (indicated by %%%%), which is not important for my questions:

```a = int8(1:9);
if sum(ismember(nchoosek(a,2),F2S,'rows')) + sum(ismember(nchoosek(a,3),F3S,'rows')) + sum(ismember(nchoosek(a,4),F4S,'rows')) + sum(ismember(nchoosek(a,5),F5S,'rows')) + sum(ismember(nchoosek(a,6),F6S,'rows')) + sum(ismember(nchoosek(a,7),F7S,'rows')) + sum(ismember(nchoosek(a,8),F8S,'rows')) == 0
%%%% here we recall a function that makes a test on the vector a =                %%%% = [1 2 3 4 5 6 7 8 9]
end
finish = 1;
while finish == 1
i = 9;
while i>0 & a(i) == 51+i
i = i-1;
end
if i == 0
save('matrixF9S','F9S');
finish = 0;
else
a(i) = a(i)+1;
while i<9
a(i+1) = a(i)+1;
i = i+1;
end
if sum(ismember(nchoosek(a,2),F2S,'rows')) + sum(ismember(nchoosek(a,3),F3S,'rows')) + sum(ismember(nchoosek(a,4),F4S,'rows')) + sum(ismember(nchoosek(a,5),F5S,'rows')) + sum(ismember(nchoosek(a,6),F6S,'rows')) + sum(ismember(nchoosek(a,7),F7S,'rows')) + sum(ismember(nchoosek(a,8),F8S,'rows')) == 0
%%%% here we recall a function that makes a test on the vector a =                %%%% = [a(1) a(2) a(3) a(4) a(5) a(6) a(7) a(8) a(9)]
end
end
end
```

All the vectors involved have ordered and not repeated components (so a never assume the value [2 19 4 47 3]): the matrixes F2S, F3S, F4S, F5S,F6S, F7S, F8S have respectively vectors of lenght 2, 3, 4, 5, 6, 7, and 8 as rows: these vectors have been founded in the previous steps. The check that I have to do is, if, for example, F3S = [4 19 47;5 7 10;5 6 13;6 13 22], if no one of the vectors that are rows of F3S has all its three components in a (for example, [1 2 3 5 6 7 9 10] is not ok, but [1 2 3 4 6 7 9 10] is ok): I don't mind the permutation of the vectors, but, because of a has lenght 9 and the rows of F3S has lenght 3, I used the command sum(ismember(nchoosek(a,3),F3S,'rows')) == 0.

Can you tell me if my explanation is clear? I don't know if my question is clear to you (my explanation is a bit complicated). Thank you very much. Lorenzo.