remove_duplicated_indexes
- trashpanda.remove_duplicated_indexes(source_with_duplicates: pandas.core.series.Series, keep: Union[str, bool] = 'first') pandas.core.series.Series
- trashpanda.remove_duplicated_indexes(source_with_duplicates: pandas.core.frame.DataFrame, keep: Union[str, bool] = 'first') pandas.core.frame.DataFrame
Removes rows of duplicated indexes from a DataFrame. Keeps by default all first occurrences.
Notes
This method assumes duplicates are within the frame with duplicates. Check with index.is_unique beforehand.
- Parameters
source_with_duplicates (Union[DataFrame, Series]) – Removes existing duplicates
keep (Union[str, bool]) – Determines which duplicates to keep. - first: Default; keeps all first occurrences of duplicated indexes. - last: Keeps all last occurrences of duplicated indexes. - False: Drops all duplicated indexes.
- Returns
Union[DataFrame, Series]
Examples
>>> from pandas import DataFrame >>> import numpy as np >>> from doctestprinter import doctest_print >>> sample_frame = pandas.DataFrame( ... np.arange(5), ... columns=["location"], ... index=pandas.Index(['a', 'a', 'b', 'b', 'c'], name="index") ... ) >>> doctest_print(sample_frame) location index a 0 a 1 b 2 b 3 c 4
Keeping the first occurrence.
>>> first_kept = remove_duplicated_indexes( ... source_with_duplicates=sample_frame, keep="first" ... ) >>> doctest_print(first_kept) location index a 0 b 2 c 4
Keeping the last occurrence.
>>> last_kept = remove_duplicated_indexes( ... source_with_duplicates=sample_frame, keep="last" ... ) >>> doctest_print(last_kept) location index a 1 b 3 c 4
Dropping all duplicates.
>>> dropped_duplicates = remove_duplicated_indexes( ... source_with_duplicates=sample_frame, keep=False ... ) >>> doctest_print(dropped_duplicates) location index c 4