How are iloc and loc different?
Posted By: Anonymous
Can someone explain how these two methods of slicing are different?
I’ve seen the docs,
and I’ve seen these answers, but I still find myself unable to understand how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.
For example, say we want to get the first five rows of a DataFrame
. How is it that these two work?
df.loc[:5]
df.iloc[:5]
Can someone present three cases where the distinction in uses are clearer?
Once upon a time, I also wanted to know how these two functions differ from df.ix[:5]
but ix
has been removed from pandas 1.0, so I don’t care anymore.
Solution
Label vs. Location
The main distinction between the two methods is:

loc
gets rows (and/or columns) with particular labels. 
iloc
gets rows (and/or columns) at integer locations.
To demonstrate, consider a series s
of characters with a nonmonotonic integer index:
>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])
49 a
48 b
47 c
0 d
1 e
2 f
>>> s.loc[0] # value at index label 0
'd'
>>> s.iloc[0] # value at index location 0
'a'
>>> s.loc[0:1] # rows at index labels between 0 and 1 (inclusive)
0 d
1 e
>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49 a
Here are some of the differences/similarities between s.loc
and s.iloc
when passed various objects:
<object>  description  s.loc[<object>] 
s.iloc[<object>] 

0 
single item  Value at index label 0 (the string 'd' ) 
Value at index location 0 (the string 'a' ) 
0:1 
slice  Two rows (labels 0 and 1 ) 
One row (first row at location 0) 
1:47 
slice with outofbounds end  Zero rows (empty Series)  Five rows (location 1 onwards) 
1:47:1 
slice with negative step  Four rows (labels 1 back to 47 ) 
Zero rows (empty Series) 
[2, 0] 
integer list  Two rows with given labels  Two rows with given locations 
s > 'e' 
Bool series (indicating which values have the property)  One row (containing 'f' ) 
NotImplementedError 
(s>'e').values 
Bool array  One row (containing 'f' ) 
Same as loc 
999 
int object not in index  KeyError 
IndexError (out of bounds) 
1 
int object not in index  KeyError 
Returns last value in s 
lambda x: x.index[3] 
callable applied to series (here returning 3^{rd} item in index)  s.loc[s.index[3]] 
s.iloc[s.index[3]] 
loc
‘s labelquerying capabilities extend wellbeyond integer indexes and it’s worth highlighting a couple of additional examples.
Here’s a Series where the index contains string objects:
>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a 49
b 48
c 47
d 0
e 1
f 2
Since loc
is labelbased, it can fetch the first value in the Series using s2.loc['a']
. It can also slice with noninteger objects:
>>> s2.loc['c':'e'] # all rows lying between 'c' and 'e' (inclusive)
c 47
d 0
e 1
For DateTime indexes, we don’t need to pass the exact date/time to fetch by label. For example:
>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M'))
>>> s3
20210131 16:41:31.879768 a
20210228 16:41:31.879768 b
20210331 16:41:31.879768 c
20210430 16:41:31.879768 d
20210531 16:41:31.879768 e
Then to fetch the row(s) for March/April 2021 we only need:
>>> s3.loc['202103':'202104']
20210331 17:04:30.742316 c
20210430 17:04:30.742316 d
Rows and Columns
loc
and iloc
work the same way with DataFrames as they do with Series. It’s useful to note that both methods can address columns and rows together.
When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.
Consider the DataFrame defined below:
>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24
Then for example:
>>> df.loc['c': , :'z'] # rows 'c' and onwards AND columns up to 'z'
x y z
c 10 11 12
d 15 16 17
e 20 21 22
>>> df.iloc[:, 3] # all rows, but only the column at index location 3
a 3
b 8
c 13
d 18
e 23
Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc
and iloc
.
For example, consider the following DataFrame. How best to slice the rows up to and including ‘c’ and take the first four columns?
>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24
We can achieve this result using iloc
and the help of another method:
>>> df.iloc[:df.index.get_loc('c') + 1, :4]
x y z 8
a 0 1 2 3
b 5 6 7 8
c 10 11 12 13
get_loc()
is an index method meaning "get the position of the label in this index". Note that since slicing with iloc
is exclusive of its endpoint, we must add 1 to this value if we want row ‘c’ as well.
Answered By: Anonymous
Disclaimer: This content is shared under creative common license ccbysa 3.0. It is generated from StackExchange Website Network.