Posted By: Anonymous
I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.
How can I “join” together all three CSV documents to create a single CSV with each row having all the attributes for each unique value of the person’s string name?
join() function in pandas specifies that I need a multiindex, but I’m confused about what a hierarchical indexing scheme has to do with making a join based on a single index.
import pandas as pd
John Galt’s answer is basically a
reduce operation. If I have more than a handful of dataframes, I’d put them in a list like this (generated via list comprehensions or loops or whatnot):
dfs = [df0, df1, df2, dfN]
Assuming they have some common column, like
name in your example, I’d do the following:
df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)
That way, your code should work with whatever number of dataframes you want to merge.
Edit August 1, 2016: For those using Python 3:
reduce has been moved into
functools. So to use this function, you’ll first need to import that module:
from functools import reduce