python pandas – add unique Ids in column from master df back in to processed dfs stored in list of dataframes
Posted By: Anonymous
I have a single df that includes multiple json strings per row that need reading and normalizing.
I can read out the json info and normalize the columns by storing each row as a new dataframe in a list – which i have done with the code below.
However I need to append the original unique Id in the original df (i.e. ‘id’: [‘9clpa’,’g659am’]) – which is lost in my current code.
The expected output is a list of dataframes per Id that include the exploded json info, with an additional column including Id (which will be repeated for each row of the final df).
I hope that makes sense, any suggestions are very welcome. thanks so much
dataframe
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
current code
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
expected output
pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
Solution
You are just missing on assigning the id
to your dataframe after your normalize columns:
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
out[i]['id'] = df.id[i]
out[i] = out[i].loc[:, ['id','qi','answers']]
Output:
>>> out[0]
id qi answers
0 9clpa 01 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1 9clpa 02 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]
Answered By: Anonymous
Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.