pandas

ybs大约 43 分钟

比赛

import pandas as pd
# 加载数据集
df = pd.read_csv("/data/SecondhandHouse.csv")

# 原始数据集的行列数
original_shape = df.shape

# 用null值替换数据中含有“暂无”两字的信息，再删除含有null的行数据
df.replace('暂无', pd.NA, inplace=True)
df.dropna(inplace=True)

# 删除null值后数据集的行列数
after_dropna_shape = df.shape

# 统计行数据重复的数量
duplicates_count = df.duplicated().sum()

# 删除重复的行数据
df.drop_duplicates(inplace=True)

# 删除重复数据后数据集的行列数
after_drop_duplicates_shape = df.shape

# 删除“建筑面积”列的面积单位“平米”，仅保留数值，并转为浮点型
df['建筑面积'] = df['建筑面积'].str.replace('平米', '').astype(float)

# 删除“建筑年代”列的年份单位“年”，仅保留数值，并转为整型
df['建筑年代'] = df['建筑年代'].str.replace('年', '').astype(int)

# 保留“建筑年代”小于等于2021年的房屋数据
df = df[df['建筑年代'] <= 2021]

# 处理后数据集的行列数
after_year_filter_shape = df.shape

# 提取“户型”列中的室、厅、卫的数量
#df[['室', '厅', '卫']] = df['户型'].str.extract('(%d)室(%d)厅(%d)卫').astype(int)
df[['室', '厅', '卫']] = df['户型'].str.extract('(\d+)室(\d+)厅(\d+)卫').astype(int)


# 使用当前年份（2022年）减去建筑年份获取房龄，并存入“房龄”列
df['房龄'] = 2022 - df['建筑年代']

# 输出处理步骤的结果
print(original_shape)
print(after_dropna_shape)
print(duplicates_count)
print(after_drop_duplicates_shape)
print(after_year_filter_shape)

df.to_csv('/data/result.csv', index=False)

准考证号：手机号+20240417

密码：240417

pandas读取数据

数据类型	说明	Pandas读取方法
tsv、txt、csv、	用逗号分隔、tab分割的纯文本文件	pd.read_csv
excel	微软xls或者xlsx文件	pd.read_excel
mysql	关系型数据库表	pd.read_sql

.head 返回前几行

.shape 返回行数列数

.columns返回列名

.index返回索引

.dtype查看类型

读取txt文件，自己指定分割符，列名

pd.read.csv(

path,

sep="\t",

header=None,

names=['1','2','3']

)

读取mysql

import pymysql
conn = pymysql.connect(
	host='127.0.0.1',
	uesr='root',
	password='123456',
	database='test',
	charset='utf8'
)
mysql_page = pd.read_sql("select * from name",con=conn)

数据结构

DataFrame & Series

1、Series

Series是一种类似于一维数组的对象，它由一组数据(不同数据类型)以及一组与之相关的数据标签(即索引)组成。

pd.Series(data=None, index=None, dtype=None, name=None, copy=False)

data 输入的数据，可以是列表、常量、ndarray 数组等,如果是字典,则保持参数顺序
index 索引值,必须是可散列的(不可变数据类型（str，bytes和数值类型）)，并且与数据具有相
同的长度,允许使用非唯一索引值。如果未提供，将默认为RangeIndex（0，1，2，…，n）
dtype 输出系列的数据类型。如果未指定，将从数据中推断
name 为Series定义一个名称
copy 表示对 data 进行拷贝，默认为 False,仅影响Series和ndarray数组

列表/数组作为数据源创建Series

S1 = pd.Series([1,'a',5.2,7])

0 1

1 a

2 5.2

3 7

dtype:object

通过index 和values属性取得对应的标签和值

获取索引

S1.index

获取数据

S1.values

通过标签取得对应的值,或者修改对应的值

s1[1] # 取得索引为1 的数据

s1[2] = 50 # 改变索引为2的数据值

字典作为数据源创建Series

sdata={'oh':3500,'te':7200,'or':160,'ui':222}

s3=pd.Series(sdata)

获取值

s2['a']

s2[0]

多值

s2[['a','b']]

index参数

创建一个具有标签索引的Series

s2 = pd.Series([1,'a',5.2,7],index=['a','b','c','d'])

当传递的索引值未匹配对应的字典键时，使用 NaN（非数字）填充。

d = {'a': 1, 'b': 2, 'c': 3}

ser = pd.Series(data=d, index=['x', 'b', 'z'])

通过匹配的索引值,改变创建Series数据的顺序

d = {'a': 1, 'b': 2, 'c': 3}

ser = pd.Series(data=d, index=['c', 'b', 'a'])

ser

name参数

我们可以给一个Series对象命名，也可以给一个Series数组中的索引列起一个名字，pandas为我们设计好了对象的属性，并在设置了name属性值用来进行名字的设定

dict_data1 = {

"Beijing":2200,

"Shanghai":2500,

"Shenzhen":1700

}

data1 = pd.Series(dict_data1)

data1

data1 = pd.Series(dict_data1)

data1.name= "City_Data"

data1.index.name= "City_Name"

Series的索引/切片

1、下标索引

2、标签索引

Series数据结构基本技巧

1. 查看前几条和后几条数据

s = pd.Series(np.random.rand(15))

print(s.head()) # 默认查看前5条数据

print(s.head(1)) # 默认查看前1条数据

print(s.tail()) # 默认查看后5条数据

2. 重新索引: reindex

s = pd.Series(np.random.rand(5),index=list("abcde"))

\# 新索引在上一个索引中不存在,生成新对象时,对应的值,设置为NaN

s1 = s.reindex(list("cde"))

\# 设置填充值

s2 = s.reindex(list("cde12"), fill_value=0)

print(s2)

对齐运算

是数据清洗的重要过程，可以按索引对齐进行运算，如果没对齐的位置则补NaN，最后也可以填充NaN

s1 = pd.Series(np.random.rand(3), index=["Kelly","Anne","T-C"])

s2 = pd.Series(np.random.rand(3), index=["Anne","Kelly","LiLy"])

删除和添加

删除

s = pd.Series(np.random.rand(5),index=list("abcde"))

s1 = s.drop("a") # 返回删除后的值,原值不改变 ,默认inplace=False

s = pd.Series(np.random.rand(5),index=list("abcde"))

s1 = s.drop("a",inplace=True) # 原值发生变化,返回None

\#s = s.drop("a")

\# inplace默认默认为True,返回None

添加

import pandas as pd

\# 添加

s1 = pd.Series(np.random.rand(5),index=list("abcde"))

s1["s"] = 100 # 对应的标签没有就是添加,,有就是修改

DataFrame

DataFrame是一个表格型的数据结构

每列可以是不同的值类型(数值、字符串、布尔值等)
既有行索引index,也有列索引columns
可以被看做由Series组成的字典

创建dataframe最常用的方法，见02节读取纯文本文件、excel、mysql数据库

pandas.DataFrame(data=None, index=None, columns=None, dtype=None,

copy=None)

data: 输入的数据，可以是 ndarray，series，list，dict，标量以及一个 DataFrame
index: 行标签，如果没有传递 index 值，则默认行标签是 RangeIndex(0, 1, 2, …,n)，n 代表 data 的元素个数。
columns: 列标签，如果没有传递 columns 值，则默认列标签是 RangeIndex(0, 1,2, …, n)。
dtype: 要强制的数据类型。只允许使用一种数据类型。如果没有，自行推断
copy: 从输入复制数据。对于dict数据，copy=True,重新复制一份。对于DataFrame或ndarray输入，类似于copy=False,使用的是试图

使用普通列表创建

data = [1,2,3,4,5]

df = pd.DataFrame(data)

使用嵌套列表创建

\# 列表中每个元素代表一行数据

data = [['xiaowang',20],['Lily',30],['Anne',40]]

\# 未分配列标签

df = pd.DataFrame(data)

data = [['xiaowang',20],['Lily',30],['Anne',40]]

\# 分配列标签

df = pd.DataFrame(data,columns=['Name','Age'])

根据多个字典序列创建dataframe

# 字典.3.6之前是没有的 key ˱ >values 变量: 变量携带数据位置

# 3.7以后是有顺序的.

data = {'Name':['**关羽**', '**刘备**', '**张飞**', '**曹操**'],'Age':[28,34,29,42]}

# 通过字典创建DataFrame

df = pd.DataFrame(data)

print(df)

\# **输入标签**

print(df.index)

添加自定义行标签

`# 字典`

data = {'Name':['**关羽**', '**刘备**', '**张飞**', '**曹操**'],'Age':[28,34,29,42]}

\# **定义行标签**

index = ["rank1", "rank2", "rank3", "rank4"]

\# **通过字典创建**DataFrame

df = pd.DataFrame(data, index=index)

print(df)

\# **输入行标签**

print(df.index)

\# **输出列表标签**

print(df.columns)

列表嵌套字典创建DataFrame对象

列表嵌套字典可以作为输入数据传递给 DataFrame 构造函数。默认情况下，字

典的键被用作列名。

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

\#df = pd.DataFrame(data)

df = pd.DataFrame(data, index=['first', 'second'])

print(df)

a b c

first 1 2 NaN

second 5 10 20.0

Series创建DataFrame对象

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

one two

a 1.0 1

b 2.0 2

c 3.0 3

d NaN 4

# 创建数据

data = {

"Name":pd.Series(['xiaowang', 'Lily', 'Anne']),

"Age":pd.Series([20, 30, 40], dtype=float),

"gender":pd.Series(["男", "男", "女"]),

"salary":pd.Series([5000, 8000, 10000], dtype=float)

}

df = pd.DataFrame(data)

# int满足某列特征,会自动使用, 不满足,则自动识别

# 解决不同列设置自定义数据类型

查询列

df[['yeaer','pop']]

# 注意列不是能使用切片选取多列

列添加

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

#使用df['列']=值，插入新的数据列

print ("通过Series添加一个新的列😊

df['three']=pd.Series([10,20,30],index=['a','b','c'])

print(df)

#将已经存在的数据列相加运算,从而创建一个新的列

print ("将已经存在的数据列相加运算,从而创建一个新的列:=")

df['four']=df['one']+df['three']

print(df)

insert() 方法插入新的列

df.insert(loc, column, value, allow_duplicates=False)

loc : 整型**,插入索引,必须验证0<=loc<=len**（列）

column : 插入列的标签**,类型可以是(字符串/数字/散列对象)**

value : 数值**,Series**或者数组

allow_duplicates : 允许重复**,可以有相同的列标签数据,默认为False**

info=[['王杰',18],['李杰',19],['刘杰',17]]

df=pd.DataFrame(info,columns=['name','age'])

print(df)

#注意是column参数

#数值1代表插入到columns列表的索引位置

df.insert(2,column='score',value=[91,90,75])

print("=df.insert插入数据:===")

print(df)

删除数据列

通过 del 和 pop() 都能够删除 DataFrame 中的数据列**,pop**有返回值

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),

'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)

print ("Our dataframe is:")

print(df)

#使用del删除

del df['one']

print("=del df['one']===")

print(df)

#使用pop方法删除

res_pop = df.pop('two')

print("=df.pop('two')===")

print(df)

print("=res_pop = df.pop('two')===")

print(res_pop)

查询行

import pandas as pd

# 定义字典

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

# 创建DataFrame数据结构

df = pd.DataFrame(d)

print("===df原始数据")

print(df)

# 确定标签为b的数据

print("===标签为b的数据")

print(df.loc['b'])

df.loc['b',"one"] 行列交叉点

行和列还可以使用切片

# 标签为b的行到标签为d的行, 对应标签为one的列

df.loc['b':'d',"one"] # 注意使用行标签切片,包含结束的行

# 注意这里和numpy整数数组索引区别

df.loc[['a','b'],["one","two"]] # 这里两个参数,第一个代表行,第二个代表列

数值型索引和切片

可以使用iloc **😗*行基于整数位置的按位置选择索引

# 取得位置索引为2的数据

df.iloc[2]

# 取得位置索引分别为0和2的数据

df.iloc[[0,2]]

# 表示行索引为0,列索引为1的数据

df.iloc[0,1]

# 取得位置索引1到3行,但是不包含3的数据

print("=df.iloc[1:3]:===")

print(df.iloc[1:3])

# 使用切片可以直接提取行

print("=df[1:3]:===")

print(df[1:3])

.添加数据行

使用 append() 函数，可以将新的数据行添加到 DataFrame 中，该函数会在行末追加数

据行

df.append(other, ignore_index=False, verify_integrity=False,sort=False)

将**"other"追加到调用者的末尾，返回一个新对象。"other"**行中不在调用者中的列将作为

新列添加。

other : DataFrame或Series/dict类对象，或这些对象的列表
ignore_index : 默认为False,如果为True将不适用index 标签**.**
verify_integrity : 默认为False如果为True，则在创建具有重复项的索引时引发ValueError.
sort : 排序

import pandas as pd

data = {

'Name':['关羽', '刘备', '张飞', '曹操'],

'Age':[28, 34, 29, 42],

"Salary":[5000, 8000, 4500, 10000]

}

df = pd.DataFrame(data)

追加字典

d2 = {"Name":"诸葛亮", "Age":30}

#在行末追加新数据行

df3 = df.append(d2) # 需要添加 ignore_index=True,不然会报错

print(df3)

Series数据有name

d2 = {"Name":"诸葛亮", "Age":30}

s = pd.Series(d2, name="a")

print(s)

#在行末追加新数据行

df3 = df.append(s) # 需要添加

print(df3)

Name 诸葛亮

Age 30

Name: a, dtype: object

Name Age Salary

0 关羽 28 5000.0

1 刘备 34 8000.0

2 张飞 29 4500.0

3 曹操 42 10000.0

a 诸葛亮 30 NaN

追加列表

注意：使用append可能会出现相同的index,想避免的话,可以使用ignore_index=True

如果list是一维的**,**则以列的形式追加
如果list是二维的**,**则以行的形式追加
如果list是三维的**,**只添加一个值

list是二维的,则以行的形式追加

a_l = [[10,"20",30],[2,5,6]]

df4 = df.append(a_l) # 需要添加

print(df4)

data = {

'Name':['关羽', '刘备', '张飞', '曹操'],

'Age':[28, 34, 29, 42],

"Salary":[5000, 8000, 4500, 10000]

}

df = pd.DataFrame(data)

a_l = [[10,"20",30],[2,5,6]]

df2 = pd.DataFrame(a_l,columns=["Name","Age","Salary"]) #需要指定列名，不然对不齐

# 将df2追加到df中返回

df4 = df.append(df2) # 需要添加

print(df4)

data = [

[1, 2, 3, 4],

[5, 6, 7, 8]

] #列表默认的列名为0,1,3所以添加a_l时能对齐列

df = pd.DataFrame(data)

print(df)

a_l = [[10,"20",30],[2,5,6]]

df5 = df.append(a_l,ignore_index=True)

df5

list是一维,则以列的形式追加

data = [

[1, 2, 3, 4],

[5, 6, 7, 8]

]

df = pd.DataFrame(data)

print(df)

a_l = [10,20]

df3 = df.append(a_l) # 需要添加

print(df3)

0 1 2 3

0 1 2.0 3.0 4.0

1 5 6.0 7.0 8.0

0 10 NaN NaN NaN

1 20 NaN NaN NaN

删除数据行

您可以使用行索引标签，从 DataFrame 中删除某一行数据。如果索引标签存在重复，那

么它们将被一起删除。

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

print("=源数据df=")

print(df)

#注意此处调用了drop()方法,注意drop默认不会更改源数据

df1 = df.drop(0)

print("=修改后数据df1=")

print(df1)

# 两种方式解决:

# 1. 源数据=修改后的数据

df = df.drop(0)

# 2.添加inplace=True

df = df.drop(1)

常用属性和方法汇总

名称	属性&方法描述
T	行和列转置。
axes	返回一个仅以行轴标签和列轴标签为成员的列表。
dtypes	返回每列数据的数据类型
empty	DataFrame中没有数据或者任意坐标轴的长度为0，则返回True
columns	返回DataFrame所有列标签
shape	返回一个元组，获取行数和列数,表示了 DataFrame 维度
size	DataFrame中的元素数量
values	使用 numpy 数组表示 DataFrame 中的元素值。
head()	返回前 n 行数据。
tail()	返回后 n 行数据。
rename()	rename(columns=字典) ,修改列名
info()	可以显示信息，例如行数/列数，总内存使用量，每列的数据类型以及不缺少值的元素数
sort_index()	默认根据行标签对所有行排序，或根据列标签对所有列排序，或根据指定某列或某几列对行排序。
sort_values()	既可以根据列数据，也可根据行数据排序

修改标签名rename()

DataFrame.rename(index=None, columns=None, inplace=False)

index: 修改后的行标签
columns: 修改后的列标签
inplace: 默认为False,不改变源数据,返回修改后的数据. True更改源数据

# 修改变量df的行标签

df.rename(index={1:"row2", 2:"row3"})

# 修改变量df的列标签

df.rename(columns = {"Name":"name", "Age":"age"})

# 添加inplace参数,修改原数据

df.rename(index={1:"row2", 2:"row3"}, columns = {"Name":"name",

"Age":"age"}, inplace=True)

df. sort_index()

sort_index(axis=0, ascending=True, inplace=False)

作用：默认根据行标签对所有行排序，或根据列标签对所有列排序，或根据指定某列或

某几列对行排序。

注意：df.sort_index()可以完成和df.sort_values()完全相同的功能，但python更推荐用只用

df.sort_index()对“根据行标签”和“根据列标签”排序，其他排序方式用df.sort_values()。

axis：0按照行名排序；1按照列名排序
ascending：默认True升序排列；False降序排列
inplace：默认False，否则排序之后的数据直接替换原来的数据

import pandas as pd

# 创建示例DataFrame
df = pd.DataFrame({
    'A': [3, 1, 2],
    'B': [6, 5, 4]
}, index=['b', 'c', 'a'])

# 按索引升序排序
sorted_df = df.sort_index()
print("按行索引排序：")
print(sorted_df)

# 按列标签升序排序
sorted_df_columns = df.sort_index(axis=1)
print("\n按列索引排序：")
print(sorted_df_columns)

df.sort_values()

DataFrame.sort_values(by, axis=0, ascending=True, inplace=False,

kind='quicksort', na_position='last')

作用：既可以根据列数据，也可根据行数据排序。

注意：必须指定by参数，即必须指定哪几行或哪几列；无法根据index名和columns名排

序（由**.sort_index()**执行）

by：str or list of str；如果axis=0，那么by="列名"；如果axis=1，那么by="行名"。
axis：{0 or ‘index’, 1 or ‘columns’}, default 0，默认按照列排序，即纵向排序；如果为1，则是横向排序。
ascending：布尔型，True则升序，如果by=['列名1','列名2']，则该参数可以是**[True,False]**，即第一字段升序，第二个降序。
inplace：布尔型，是否用排序后的数据框替换现有的数据框。
na_position：{‘first’, ‘last’}, default ‘last’，默认缺失值排在最后面。

import pandas as pd

# 创建示例DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 20, 30],
    'Score': [88, 92, 85]
})

# 按年龄升序排序
sorted_df = df.sort_values(by='Age')
print("按年龄排序：")
print(sorted_df)

# 按分数降序排序
sorted_df_desc = df.sort_values(by='Score', ascending=False)
print("\n按分数降序排序：")
print(sorted_df_desc)

# 源数据

df = pd.DataFrame({'b':[1,2,3,2],'a':[4,3,2,1],'c':[1,3,8,2]},index=

[2,0,1,3])

按b列升序排序

#等同于df.sort_values(by='b',axis=0)

df.sort_values(by='b')

先按b列降序，再按a列升序排序

df.sort_values(by=['b','a'],ascending=[False,True])

#等同于df.sort_values(by=['b','a'],axis=0,ascending=[False,True])

按行3升序排列

df.sort_values(by=3,axis=1) #必须指定axis=1

按行3升序，行0降排列

df.sort_values(by=[3,0],axis=1,ascending=[True,False])

time模块

不牵扯时区的问题,便于计算

a、timestamp时间戳，时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量
b、struct_time时间元组，共有九个元素组。
c、format time 格式化时间，已格式化的结构使时间更具可读性。包括自定义格式和固定格式。

时间格式转换图：

1.时间戳到结构化时间
import time

# 当前时间戳
timestamp = time.time()

# 转换为结构化时间
struct_time = time.localtime(timestamp)
print(struct_time)
2. 结构化时间到格式化时间

# 使用strftime格式化结构化时间
format_time = time.strftime('%Y-%m-%d %H:%M:%S', struct_time)
print(format_time)
3. 格式化时间到时间戳

# 使用strptime解析格式化时间为结构化时间
parsed_time = time.strptime(format_time, '%Y-%m-%d %H:%M:%S')

# 从结构化时间转换为时间戳
timestamp_from_format = time.mktime(parsed_time)
print(timestamp_from_format)

主要time生成方法和time格式转换方法实例

# 导入time模块

import time

# 生成timestamp

time.time()

#格式化字符串到 struct_time

time.strptime('2011-05-05 16:37:06', '%Y-%m-%d %X')

struct_time元组元素结构

属性	值
tm_year（年）	比如2011
tm_mon（月）	1 - 12
tm_mday（日）	1 - 31
tm_hour（时）	0 - 23
tm_min（分）	0 - 59
tm_sec（秒）	0 - 61
tm_wday（weekday）	0 - 6（0表示周日）
tm_yday（一年中的第几天）	1 - 366
tm_isdst（是否是夏令时）	默认为-1

作用:

取得时间戳/时间格式的字符串中对应的年/月/日等信息
作为时间戳和字符串时间之间的桥梁

time_stuct = time.strptime('2011-05-07 16:37:06', '%Y-%m-%d %X')

print(time_stuct.tm_year)

print(time_stuct.tm_mon)

print(time_stuct.tm_mday)

print(time_stuct.tm_hour)

print(time_stuct.tm_min)

my = 'aaa'

'%s'% my

my_int = 1

'%d'% my_int

"我们在{}工作".format('家里')

addr = "家里"

f"我们在{addr}工作"

format time结构化表示

格式含义

%Y -年[0001，...，2018，2019，...，9999]

%m -月[01，02，...，11，12]

%d -天[01，02，...，30，31]

%H -小时[00，01，...，22，23

%M -分钟[00，01，...，58，59]

%S -秒[00，01，...，58，61]

%X 本地相应时间

%y 去掉世纪的年份（00 - 99）

datetime模块

datatime模块重新封装了time模块，提供更多接口，提供的类有：

date,time,datetime,timedelta,tzinfo

date类

datetime.date(year, month, day)

静态方法和字段

date.today()：返回一个表示当前本地日期的date对象；
date.fromtimestamp(timestamp)：根据给定的时间戮，返回一个date对象；

from datetime import *

import time

print('date.today():', date.today())

print('date.fromtimestamp():', date.fromtimestamp(time.time()))

date.today(): 2024-03-31
date.fromtimestamp(): 2024-03-31

方法和属性

d1 = date(2011,06,03) #date对象

d1.year、date.month、date.day：年、月、日；
d1.replace(year, month, day)：生成一个新的日期对象，用参数指定的年，月，日
代替原有对象中的属性。（原有对象仍保持不变）
d1.timetuple()：返回日期对应的time.struct_time对象；
d1.weekday()：返回weekday，如果是星期一，返回0；如果是星期2，返回1，以此类推；
d1.isoweekday()：返回weekday，如果是星期一，返回1；如果是星期2，返回2，以此类推；
d1.isoformat()：返回格式如'YYYY-MM-DD’的字符串；
d1.strftime(fmt)：和time模块format相同。

now = date(2021, 10, 26)
print(now.year,now.month,now.day)
tomorrow = now.replace(day = 27)
print('now:', now, ', tomorrow:', tomorrow)
print('timetuple():', now.timetuple())
print('weekday():', now.weekday())
print('isoweekday():', now.isoweekday())
print('isoformat():', now.isoformat())
print('strftime():', now.strftime("%Y-%m-%d"))

.时间转化

to_datetime 转换时间戳

你可能会想到，我们经常要和文本数据（字符串）打交道，能否快速将文本数据转为

时间戳呢？

to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

utc=None, format=None, unit=None, infer_datetime_format=False,

origin='unix')

函数用户将数组、序列或dict的对象转换为datetime对象

arg 要转换为日期时间的对象
errors :错误处理
- If 'raise',将引发异常.
- If 'coerce', 无效的转换,使用NaT.
- If 'ignore', 无效的转换,将使用输入的数据.
dayfirst :转换时指定日期分析顺序 yearfirst
utc :控制与时区相关的解析、本地化和转换(忽略)
format : 用于分析时间的strftime，例如“%d/%m/%Y”,自定义格式
unit : D,s,ms 将时间戳转datetime
origin : 定义参考日期。数值将被解析为自该参考日期起的单位数

# origin参考起始时间

pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('2020-01-11'))

DatetimeIndex(['2020-01-12', '2020-01-13', '2020-01-14'],