# Pandas / Python – Compare numerically each row with all others

###### Posted By: Anonymous

I program in Python and I would like to compare in an optimized way each row of a numerical pandas dataframe with every row of itself.

For example, if I have :

```
A = pd.DataFrame([[1,1,1,1,1],[1,1,1,1,1],[2,2,2,2,2], [1,10,1,1,1]])
thr = 0.3
B = A * thr
```

With A as following :

0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|

0 | 1 | 1 | 1 | 1 | 1 | 1 |

1 | 1 | 1 | 1 | 1 | 1 | 1 |

2 | 2 | 2 | 2 | 2 | 2 | 2 |

3 | 1 | 10 | 1 | 1 | 1 | 1 |

With B as following :

0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|

0 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |

1 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |

2 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 |

3 | 0.3 | 3 | 0.3 | 0.3 | 0.3 | 0.3 |

My aim is to have a dictionary with the rows of B where all cells are inferior to the cells of a given one of A, as following :

```
{
0:[1,2],
1:[0,2],
2:[0,1],
3:[0,1,2]
}
```

For example, the A’s row of index 0 is superior to B’s rows of index 1 and 2 ==> So we have "0:[1,2]" in the dictionary.

My question is : what is the best way to compute this, i.e. the fastest in terms of execution time and with the least computational cost please ? Using for loops, pandas function, apply/applymap functions etc… ? Would you have some code in Python to do it ?

Thank you very much in advance.

Best regards

## Solution

One option is to `broadcast`

greater than over both frames and evaluate any rows on axis 2 where `all`

values are true:

```
all_greater = (A.values[:, None] > B.values).all(axis=2)
```

```
[[ True True True False]
[ True True True False]
[ True True True False]
[ True True True True]]
```

Here each column represents a row in B, and each row represents a row from A.

`[True True True False]`

indicates index 0 in A is greater than all elements from index 0, 1, 2 in B.

From this, create a new frame via `np.nonzero`

+ `np.transpose`

```
df = pd.DataFrame(np.transpose(np.nonzero(all_greater)), columns=['A', 'B'])
```

```
A B
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
9 3 0
10 3 1
11 3 2
12 3 3
```

Then filter out the self matches, `groupby agg`

into lists:

```
d = df[df['A'].ne(df['B'])].groupby('A')['B'].agg(list).to_dict()
```

```
{0: [1, 2], 1: [0, 2], 2: [0, 1], 3: [0, 1, 2]}
```

Complete Working Example:

```
import numpy as np
import pandas as pd
A = pd.DataFrame([
[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [1, 10, 1, 1, 1]
])
thr = 0.3
B = A * thr
all_greater = (A.values[:, None] > B.values).all(axis=2)
df = pd.DataFrame(np.transpose(np.nonzero(all_greater)), columns=['A', 'B'])
d = df[df['A'].ne(df['B'])].groupby('A')['B'].agg(list).to_dict()
print(d)
```

###### Answered By: Anonymous

Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.