Skip to content
Fix Code Error

How do I merge two dictionaries in a single expression (taking union of dictionaries)?

March 13, 2021 by Code Error
Posted By: Carl Meyer

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

>>> x = {'a': 1, 'b': 2}
>>> y = {'b': 10, 'c': 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{'a': 1, 'b': 10, 'c': 11}

How can I get that final merged dictionary in z, not x?

(To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I’m looking for as well.)

Solution

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly merged dictionary with values from y replacing those from x.

  • In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

    z = x | y          # NOTE: 3.9+ ONLY
    
  • In Python 3.5 or greater:

    z = {**x, **y}
    
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with x's keys and values
        z.update(y)    # modifies z with y's keys and values & returns None
        return z
    

    and now:

    z = merge_two_dicts(x, y)
    

Explanation

Say you have two dictionaries and you want to merge them into a new dict without altering the original dictionaries:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary’s values overwriting those from the first.

>>> z
{'a': 1, 'b': 3, 'c': 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, 'foo': 1, 'bar': 2, **y}

and now:

>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into What’s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x‘s values, thus 'b' will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dictionaries, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an undefined number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dictionaries, shallow copy and merge into a new dict,
    precedence goes to key-value pairs in latter dictionaries.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Critiques of Other Answers

Don’t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you’re adding two dict_items objects together, not two lists –

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don’t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Here’s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more-so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it’s difficult to read, it’s not the intended usage, and so it is not Pythonic.

Here’s an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with
declaring dict({}, **{1:3}) illegal, since after all it is abuse of
the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call
x.update(y) and return x". Personally, I find it more despicable than
cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{'a': 1, 'b': 10, 'c': 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn’t work for 3 when keys are non-strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}

This inconsistency was bad given other implementations of Python (Pypy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we’re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged […] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first’s values being overwritten by the second’s – in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior.
They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

Performance Analysis

I’m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys('abcdefg')
y = dict.fromkeys('efghijk')

def merge_two_dicts(x, y):
    z = x.copy()
    z.update(y)
    return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Resources on Dictionaries

  • My explanation of Python’s dictionary implementation, updated for 3.6.
  • Answer on how to add new keys to a dictionary
  • Mapping two lists into a dictionary
  • The official Python docs on dictionaries
  • The Dictionary Even Mightier – talk by Brandon Rhodes at Pycon 2017
  • Modern Python Dictionaries, A Confluence of Great Ideas – talk by Raymond Hettinger at Pycon 2017
Answered By: Anonymous

Related Articles

  • SQLException: No suitable Driver Found for…
  • What's the difference between eval, exec, and compile?
  • Setting up and using Meld as your git difftool and mergetool
  • Can't install via pip because of egg_info error
  • Git merge with force overwrite
  • FFmpeg: How to split video efficiently?
  • Python - How to assign/map non-sequential JSON…
  • Recalculate merge conflicts (ie. how to generate…
  • Fastest way to iterate over all the chars in a String
  • Python is not calling fucntions properly
  • Are dictionaries ordered in Python 3.6+?
  • sql query to find priority jobs
  • generate days from date range
  • Using enums in a spring entity
  • Pandas Merging 101
  • Extract from Union type where discriminator is also a Union
  • Union of multiple Database queries with same parameters
  • SQL query return data from multiple tables
  • How do you easily create empty matrices javascript?
  • I want to create a SQLite database like in the…
  • Git workflow and rebase vs merge questions
  • Usage of __slots__?
  • When dealing with Localizable.stringsdict, why…
  • Keras Sequential API is replacing every layer with…
  • How to filter a RecyclerView with a SearchView
  • More elegant way of declaring multiple variables at…
  • how does git know there is a conflict
  • How to turn off ALL conventions in Entity Framework Core 5
  • When would you use the different git merge strategies?
  • Cannot update nested dictionary properly
  • Error creating bean with name 'entityManagerFactory'…
  • How to trigger a model update from another model in…
  • Active tab issue on page load HTML
  • Why does git perform fast-forward merges by default?
  • How to "perfectly" override a dict?
  • python 3.2 UnicodeEncodeError: 'charmap' codec can't…
  • Install pip in docker
  • Ukkonen's suffix tree algorithm in plain English
  • I wrote a simple python game but functions in that…
  • Convert JSON File which contains multiple dictionary…
  • Git stash pop- needs merge, unable to refresh index
  • How to create dictionary and add key–value pairs…
  • Auto-fit TextView for Android
  • Git Cherry-pick vs Merge Workflow
  • Practical uses of git reset --soft?
  • can I run multiple DataFrames and apply a function…
  • Python: pandas merge multiple dataframes
  • How can you change Network settings (IP Address,…
  • In plain English, what does "git reset" do?
  • How do you create nested dict in Python?
  • How to update Python?
  • Changing Icon Color behind ListTile in an…
  • How do I use 3DES encryption/decryption in Java?
  • How to UPSERT (MERGE, INSERT ... ON DUPLICATE…
  • TypeError: 'type' object is not subscriptable when…
  • Making a map of composite types typescript
  • flutter - add network images in a pdf while creating…
  • How do I 'overwrite', rather than 'merge', a branch…
  • Python File Error: unpack requires a buffer of 16 bytes
  • git rebase merge conflict
  • Git pull a certain branch from GitHub
  • Change private static final field using Java reflection
  • failed to push some refs to [email protected]
  • Smart way to truncate long strings
  • Octave using 'for' statement to show two animations…
  • Instance member can't be accessed using static access
  • SVN how to resolve new tree conflicts when file is…
  • How not to get a repeated attribute of an object?
  • TypeError: 'str' object is not callable (Python)
  • In Visual Studio Code How do I merge between two…
  • Why does git say "Pull is not possible because you…
  • POI Word Unable to merge newly created cell vertically
  • What is the difference between dict.items() and…
  • Merge Sort in C using Recursion
  • How to have multiple values for each row in pandas
  • SQL: parse the first, middle and last name from a…
  • Merge two dataframes by index
  • When should iteritems() be used instead of items()?
  • Python3 asyncio: Process tasks from dict and store…
  • Why doesn't the height of a container element…
  • How to get string objects instead of Unicode from JSON?
  • Git Pull Request no changes but git diff show changes
  • Reference - What does this regex mean?
  • What does Ruby have that Python doesn't, and vice versa?
  • Creating a custom counter in Spark based on…
  • What are type hints in Python 3.5?
  • Python reading comma seperated list from JSON
  • Why does C++ code for testing the Collatz conjecture…
  • Asking the user for input until they give a valid response
  • What does "SyntaxError: Missing parentheses in call…
  • What does "Fatal error: Unexpectedly found nil while…
  • Round number to nearest integer
  • How to download and save an image in Android
  • Why the REPL is not showing the full trace of the…
  • Can't remove element from dictionary
  • Relative imports for the billionth time
  • python to arduino serial read & write
  • how to save image in DCIM folder of external SDCARD…
  • Iterating over large dataframe to write individual…
  • Pyparsing: Parse Dictionary-Like Structure into an…

Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.

Post navigation

Previous Post:

CSS Background Opacity

Next Post:

jQuery AJAX submit form

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

.net ajax android angular arrays aurelia backbone.js bash c++ css dataframe ember-data ember.js excel git html ios java javascript jquery json laravel linux list mysql next.js node.js pandas php polymer polymer-1.0 python python-3.x r reactjs regex sql sql-server string svelte typescript vue-component vue.js vuejs2 vuetify.js

  • you shouldn’t need to use z-index
  • No column in target database, but getting “The schema update is terminating because data loss might occur”
  • Angular – expected call-signature: ‘changePassword’ to have a typedeftslint(typedef)
  • trying to implement NativeAdFactory imports deprecated method by default in flutter java project
  • What should I use to get an attribute out of my foreign table in Laravel?
© 2022 Fix Code Error