python学习要点（一） - luozhiyun`s Blog

列表和元组

列表是动态的，长度大小不固定，可以随意地增加、删减或者改变元素（mutable）。
而元组是静态的，长度大小固定，无法增加删减或者改（immutable）。

如果你想对已有的元组做任何"改变”，那就只能重新开辟一块内存，创建新的元组了。

如下：

tup = (1, 2, 3, 4)
new_tup = tup + (5, ) # 创建新的元组new_tup，并依次填充原元组的值 new _tup (1, 2, 3, 4, 5)

l = [1, 2, 3, 4]
l.append(5) # 添加元素5到原列表的末尾 l [1, 2, 3, 4, 5]

列表和元组存储方式的差异

由于列表是动态的，所以它需要存储指针，来指向对应的元素。另外，由于列表可变，所以需要额外存储已经分配的长度大小，这样才可以即使扩容。

l = [] 
l.__sizeof__() // 空列表的存储空间为40字节 
40 
l.append(1) l.__sizeof__() 
72 // 加⼊了元素1之后，列表为其分配了可以存储4个元素的空间 (72 - 40)/8 = 4 
l.append(2) 
l.__sizeof__() 
72 // 由于之前分配了空间，所以加⼊元素2，列表空间不变
l.append(3) 
l.__sizeof__() 
72 // 同上 
l.append(4) 
l.__sizeof__() 
72 // 同上 
l.append(5) 
l.__sizeof__() 
104 // 加⼊元素5之后，列表的空间不⾜，所以⼜额外分配了可以存储4个元素的空间

但是对于元组，情况就不同了。元组长度大小固定，元素不可变，所以存储空间固定。

列表和元组的性能

元组要比列表更加轻量级一些，所以总体上来说，元组的性能速度要略优于列表。

Python会在后台，对静态数据做一些资源缓存资源缓存（resource caching）。通常来说，因为垃圾回收机制的存在，如果一些变量不被使用了，Python就会回收它们所占用的内存，返还给操作系统，以便其他变量或其他应用使用。

但是对于一些静态变量，比如元组，如果它不被使用并且占用空间不大时，Python会暂时缓存这部分内存。

由下面例子元组的初始化速度，要比列表快5倍。

python3 -m timeit 'x=(1,2,3,4,5,6)' 
20000000 loops, best of 5: 9.97 nsec per loop 
python3 -m timeit 'x=[1,2,3,4,5,6]' 
5000000 loops, best of 5: 50.1 nsec per loop

字典和集合

集合和字典基本相同，唯一的区别，就是集合没有键和值的配对，是一系列无序的、唯一的元素组合。

如何访问、使用就不说了，说两个注意点:
1.Python 中字典和集合，无论是键还是值，都可以是混合类型

s = {1, 'hello', 5.0}

2.字典访问可以直接索引键，如果不存在，就会抛出异常；也可以使用 get(key, default) 函数来进行索引。如果键不存在，调用 get() 函数可以返回一个默认值。

d = {'name': 'jason', 'age': 20}
d['name']
'jason'
d['location']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'location'

d = {'name': 'jason', 'age': 20}
d.get('name')
'jason'
d.get('location', 'null')
'null

字典和集合的工作原理

字典和集合的内部结构都是一张哈希表。

对于字典而言，这张表存储了哈希值（hash）、键和值这 3 个元素。
而对集合来说，区别就是哈希表内没有键和值的配对，只有单一的元素了。

老版本 Python 的哈希表结构如下所示：

--+-------------------------------+
  | 哈希值 (hash)  键 (key)  值 (value)
--+-------------------------------+
0 |    hash0      key0    value0
--+-------------------------------+
1 |    hash1      key1    value1
--+-------------------------------+
2 |    hash2      key2    value2
--+-------------------------------+
. |           ...
__+_______________________________+

随着哈希表的扩张，它会变得越来越稀疏。举个例子：

{'name': 'mike', 'dob': '1999-01-01', 'gender': 'male'}

那么它会存储为类似下面的形式：

entries = [
['--', '--', '--']
[-230273521, 'dob', '1999-01-01'],
['--', '--', '--'],
['--', '--', '--'],
[1231236123, 'name', 'mike'],
['--', '--', '--'],
[9371539127, 'gender', 'male']
]

这样的设计结构显然非常浪费存储空间。

为了提高存储空间的利用率，现在的哈希表除了字典本身的结构，会把索引和哈希值、键、值单独分开：

Indices
----------------------------------------------------
None | index | None | None | index | None | index ...
----------------------------------------------------

Entries
--------------------
hash0   key0  value0
---------------------
hash1   key1  value1
---------------------
hash2   key2  value2
---------------------
        ...
---------------------

在新的哈希表结构下的存储形式，上面的例子就是这样：

indices = [None, 1, None, None, 0, None, 2]
entries = [
[1231236123, 'name', 'mike'],
[-230273521, 'dob', '1999-01-01'],
[9371539127, 'gender', 'male']
]

插入操作

每次向字典或集合插入一个元素时，Python 会首先计算键的哈希值（hash(key)），再和 mask = PyDicMinSize – 1 做与操作，计算这个元素应该插入哈希表的位置 index = hash(key) & mask。

如果哈希表中此位置是空的，那么这个元素就会被插入其中。而如果此位置已被占用，Python 便会比较两个元素的哈希值和键是否相等。

若两者中有一个不相等，这种情况我们通常称为哈希冲突（hash collision），意思是两个元素的键不相等，但是哈希值相等。这种情况下，Python 便会继续寻找表中空余的位置，直到找到位置为止。

删除操作

对于删除操作，Python 会暂时对这个位置的元素，赋于一个特殊的值，等到重新调整哈希表的大小时，再将其删除。

为了保证其高效性，字典和集合内的哈希表，通常会保证其至少留有 1/3 的剩余空间。随着元素的不停插入，当剩余空间小于 1/3 时，Python 会重新获取更大的内存空间，扩充哈希表。

函数变量作用域

局部变量

如果变量是在函数内部定义的，就称为局部变量，只在函数内部有效。一旦函数执行完毕，局部变量就会被回收，无法访问。

对于嵌套函数来说，内部函数可以访问外部函数定义的变量，但是无法修改，若要修改，必须加上 nonlocal 这个关键字：

def outer():
    x = "local"
    def inner():
        nonlocal x # nonlocal 关键字表示这里的 x 就是外部函数 outer 定义的变量 x
        x = 'nonlocal'
        print("inner:", x)
    inner()
    print("outer:", x)
outer()
# 输出
inner: nonlocal
outer: nonlocal

如果不加上 nonlocal 这个关键字，而内部函数的变量又和外部函数变量同名，那么同样的，内部函数变量会覆盖外部函数的变量。

def outer():
    x = "local"
    def inner():
        x = 'nonlocal' # 这里的 x 是 inner 这个函数的局部变量
        print("inner:", x)
    inner()
    print("outer:", x)
outer()
# 输出
inner: nonlocal
outer: local

全局变量

全局变量则是定义在整个文件层次上的，可以在文件内的任何地方被访问，但是不能在函数内部随意改变全局变量的值。
例如：

MIN_VALUE = 1
MAX_VALUE = 10
def validation_check(value):
    ...
    MIN_VALUE += 1
    ...
validation_check(5)

#输出
UnboundLocalError: local variable 'MIN_VALUE' referenced before assignment

因为，Python 的解释器会默认函数内部的变量为局部变量，但是又发现局部变量 MIN_VALUE 并没有声明，因此就无法执行相关操作。

如果我们一定要在函数内部改变全局变量的值，就必须加上 global 这个声明：

MIN_VALUE = 1
MAX_VALUE = 10
def validation_check(value):
    global MIN_VALUE
    ...
    MIN_VALUE += 1
    ...
validation_check(5)

如果遇到函数内部局部变量和全局变量同名的情况，那么在函数内部，局部变量会覆盖全局变量，比如下面这种：

MIN_VALUE = 1
MAX_VALUE = 10
def validation_check(value):
    MIN_VALUE = 3
    ...

闭包

闭包中外部函数返回的是一个函数，返回的函数通常赋于一个变量，这个变量可以在后面被继续执行调用。

比如，我们想计算一个数的 n 次幂，用闭包可以写成下面的代码：

def nth_power(exponent):
    def exponent_of(base):
        return base ** exponent
    return exponent_of # 返回值是 exponent_of 函数

square = nth_power(2) # 计算一个数的平方
cube = nth_power(3) # 计算一个数的立方 
square
# 输出
<function __main__.nth_power.<locals>.exponent(base)>

cube
# 输出
<function __main__.nth_power.<locals>.exponent(base)>

print(square(2))  # 计算 2 的平方
print(cube(2)) # 计算 2 的立方
# 输出
4 # 2^2
8 # 2^3

这里外部函数 nth_power() 返回值，是函数 exponent_of()，而不是一个具体的数值。

面对对象

函数

静态函数：与类没有什么关联可以用来做一些简单独立的任务，既方便测试，也能优化代码结构。静态函数可以通过在函数前一行加上 @staticmethod 来表示。

类函数：第一个参数一般为 cls，表示必须传一个类进来。类函数最常用的功能是实现不同的 init 构造函数，类似java中的构造器。类函数需要装饰器 @classmethod 来声明。

成员函数：是我们最正常的类的函数，它不需要任何装饰器声明，第一个参数 self 代表当前对象的引用，可以通过此函数，来实现想要的查询 / 修改类的属性等功能。

例子如下：

class Document():

    WELCOME_STR = 'Welcome! The context for this book is {}.'

    def __init__(self, title, author, context):
        print('init function called')
        self.title = title
        self.author = author
        self.__context = context

    # 类函数
    @classmethod
    def create_empty_book(cls, title, author):
        return cls(title=title, author=author, context='nothing')

    # 成员函数
    def get_context_length(self):
        return len(self.__context)

    # 静态函数
    @staticmethod
    def get_welcome(context):
        return Document.WELCOME_STR.format(context)

empty_book = Document.create_empty_book('What Every Man Thinks About Apart from Sex', 'Professor Sheridan Simove')

print(empty_book.get_context_length())
print(empty_book.get_welcome('indeed nothing'))

########## 输出 ##########

init function called
7
Welcome! The context for this book is indeed nothing.

继承

class Entity():
    def __init__(self, object_type):
        print('parent class init called')
        self.object_type = object_type

    def get_context_length(self):
        raise Exception('get_context_length not implemented')

    def print_title(self):
        print(self.title)

class Document(Entity):
    def __init__(self, title, author, context):
        print('Document class init called')
        Entity.__init__(self, 'document')
        self.title = title
        self.author = author
        self.__context = context

    def get_context_length(self):
        return len(self.__context)

class Video(Entity):
    def __init__(self, title, author, video_length):
        print('Video class init called')
        Entity.__init__(self, 'video')
        self.title = title
        self.author = author
        self.__video_length = video_length

    def get_context_length(self):
        return self.__video_length

harry_potter_book = Document('Harry Potter(Book)', 'J. K. Rowling', '... Forever Do not believe any thing is capable of thinking independently ...')
harry_potter_movie = Video('Harry Potter(Movie)', 'J. K. Rowling', 120)

print(harry_potter_book.object_type)
print(harry_potter_movie.object_type)

harry_potter_book.print_title()
harry_potter_movie.print_title()

print(harry_potter_book.get_context_length())
print(harry_potter_movie.get_context_length())

########## 输出 ##########

Document class init called
parent class init called
Video class init called
parent class init called
document
video
Harry Potter(Book)
Harry Potter(Movie)
77
120

我们可以从中抽象出一个叫做 Entity 的类，来作为Document 和 Video的父类。

每个类都有构造函数，继承类在生成对象的时候，是不会自动调用父类的构造函数的，因此你必须在 init() 函数中显式调用父类的构造函数。它们的执行顺序是子类的构造函数 -> 父类的构造函数。

由于父类的get_context_length方法是用来被重写的，所以使用 Entity 直接生成对象，调用 get_context_length() 函数，就会 raise error 中断程序的执行。

继承的优势：减少重复的代码，降低系统的熵值（即复杂度）。

抽象类

抽象类是一种特殊的类，它生下来就是作为父类存在的，一旦对象化就会报错。同样，抽象函数定义在抽象类之中，子类必须重写该函数才能使用。相应的抽象函数，则是使用装饰器 @abstractmethod 来表示。

抽象类就是这么一种存在，它是一种自上而下的设计风范，你只需要用少量的代码描述清楚要做的事情，定义好接口，然后就可以交给不同开发人员去开发和对接。

from abc import ABCMeta, abstractmethod

class Entity(metaclass=ABCMeta):
    @abstractmethod
    def get_title(self):
        pass

    @abstractmethod
    def set_title(self, title):
        pass

class Document(Entity):
    def get_title(self):
        return self.title

    def set_title(self, title):
        self.title = title

document = Document()
document.set_title('Harry Potter')
print(document.get_title())

entity = Entity()

########## 输出 ##########

Harry Potter

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-266b2aa47bad> in <module>()
     21 print(document.get_title())
     22 
---> 23 entity = Entity()
     24 entity.set_title('Test')

TypeError: Can't instantiate abstract class Entity with abstract methods get_title, set_title

代码中entity = Entity()直接报错，只有通过 Document 继承 Entity 才能正常使用。