remove_duplicated_indexes

trashpanda.remove_duplicated_indexes(source_with_duplicates: pandas.core.series.Series, keep: Union[str, bool] = 'first') pandas.core.series.Series
trashpanda.remove_duplicated_indexes(source_with_duplicates: pandas.core.frame.DataFrame, keep: Union[str, bool] = 'first') pandas.core.frame.DataFrame

Removes rows of duplicated indexes from a DataFrame. Keeps by default all first occurrences.

Notes

This method assumes duplicates are within the frame with duplicates. Check with index.is_unique beforehand.

Parameters
  • source_with_duplicates (Union[DataFrame, Series]) – Removes existing duplicates

  • keep (Union[str, bool]) – Determines which duplicates to keep. - first: Default; keeps all first occurrences of duplicated indexes. - last: Keeps all last occurrences of duplicated indexes. - False: Drops all duplicated indexes.

Returns

Union[DataFrame, Series]

Examples

>>> from pandas import DataFrame
>>> import numpy as np
>>> from doctestprinter import doctest_print
>>> sample_frame = pandas.DataFrame(
...     np.arange(5),
...     columns=["location"],
...     index=pandas.Index(['a', 'a', 'b', 'b', 'c'], name="index")
... )
>>> doctest_print(sample_frame)
       location
index
a             0
a             1
b             2
b             3
c             4

Keeping the first occurrence.

>>> first_kept = remove_duplicated_indexes(
...     source_with_duplicates=sample_frame, keep="first"
... )
>>> doctest_print(first_kept)
       location
index
a             0
b             2
c             4

Keeping the last occurrence.

>>> last_kept = remove_duplicated_indexes(
...     source_with_duplicates=sample_frame, keep="last"
... )
>>> doctest_print(last_kept)
       location
index
a             1
b             3
c             4

Dropping all duplicates.

>>> dropped_duplicates = remove_duplicated_indexes(
...     source_with_duplicates=sample_frame, keep=False
... )
>>> doctest_print(dropped_duplicates)
       location
index
c             4