python 报 UnicodeEncodeError: 'gbk' codec can't encode character '\u2f64' in position 123362: illegal multibyte sequence

文章类型:python

发布者:hp

发布时间:2024-11-26

一:原因

在进行python爬取数据时候,写入文件时报错,编码为 GBK 编码时失败了

UnicodeEncodeError: 'gbk' codec can't encode character '\u2f64' in position 123362: illegal multibyte sequence

二:解决方案

1:核对网站头部跟写入文件格式保持一致,使用同一种类型,然后重新爬取,就可以正常写入了

 with open('pqms.txt', 'w',encoding='utf-8') as f:
                f.write(str(menuList))
                # 列表地址抓取完毕,开始抓详情
            print("写入本地文件成功")

三:总结

1:手动指定写入编码格式encoding='utf-8',!!!

四:完整代码

def getUrlList():
    global menuList
    for i in range(1,153):
       
        print(f"第{i}页")
        # url=baseUrl
       

        response=session.get(url,headers=headers)
        response.encoding = "utf-8"
        html=response.json()
        soup = BeautifulSoup(response.text,features="html.parser")
        print('返回的数据')
        # print(html)
        print(len(html['data']['list']))
        for m in html['data']['list']:
            menuList.append(m)
        if(len(menuList)>=1530):
            print(f"列表地址抓取完毕,共{len(menuList)}条")
            # 写入文件
            with open('pqms.txt', 'w',encoding='utf-8') as f:
                f.write(str(menuList))
                # 列表地址抓取完毕,开始抓详情
            print("写入本地文件成功")