Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure
# A B C D# 0 foo one -1.202872 -0.055224# 1 bar one -1.814470 2.395985# 2 foo two 1.018601 1.552825# 3 bar three -0.595447 0.166599# 4 foo two 1.395433 0.047609# 5 bar two -0.392670 -0.136473# 6 foo one 0.007207 -0.561757# 7 foo three 1.928123 -1.623033df.groupby('A').sum()# C D# A # bar -2.802588 2.42611# foo 3.146492 -0.63958df.groupby(['A', 'B']).sum()# C D# A B # bar one -1.814470 2.395985# three -0.595447 0.166599# two -0.392670 -0.136473# foo one -1.195665 -0.616981# three 1.928123 -1.623033# two 2.414034 1.600434
Reshaping
The stack() method “compresses” a level in the DataFrame’s columns.
# A B# first second # bar one 0.029399 -0.542108# two 0.282696 -0.087302# baz one -1.575170 1.771208# two 0.816482 1.100230stacked = df.stack()# first second # bar one A 0.029399# B -0.542108# two A 0.282696# B -0.087302# baz one A -1.575170# B 1.771208# two A 0.816482# B 1.100230
The inverse operation of stack() is unstack(), which by default unstacks the last level:
stacked.unstack()# A B# first second # bar one 0.029399 -0.542108# two 0.282696 -0.087302# baz one -1.575170 1.771208# two 0.816482 1.100230stacked.unstack(1)# second one two# first # bar A 0.029399 0.282696# B -0.542108 -0.087302# baz A -1.575170 0.816482# B 1.771208 1.100230
df["grade"].cat.categories = ["very good","good","very bad"]# 0 very good# 1 good# 2 good# 3 very good# 4 very good# 5 very bad# Name: grade, dtype: category# Categories (3, object): [very good, good, very bad]
就可以依分類做 sort 或 groupby
df.sort_values(by="grade")# id raw_grade grade# 5 6 e very bad# 1 2 b good# 2 3 b good# 0 1 a very good# 3 4 a very good# 4 5 a very gooddf.groupby("grade").size()# grade# very bad 1# bad 0# medium 0# good 2# very good 3# dtype: int64
Plotting
import matplotlib.pyplot as pltts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))ts = ts.cumsum()ts.plot()plt.show()