基本问题

列出5个常用Python标准库

Python标准库非常庞大，包含了很多内置模块（C编写），用以实现系统级功能，例如I/O，还有大量以Python编写的模块，提供了日常编程中许多问题的标准解决方案。其中有些模块经过专门设计，通过将特定平台功能抽象化为平台中立的API来增加Python程序的可移植性。Windows版本的Python安装程序包通常包含整个标准库，还有许多额外的组件。但是类Unix系统下，Python会分成一系列的软件包，因此可能需要使用操作系统提供的包管理工具来获取部分或者全部可选组件。
如果不确定接口里函数有哪些或者怎么用，可以通过引入模块后，
dir(os)和help(os)查看有哪些函数和用法
操作系统接口：
1.os:提供了许多与操作系统交互的函数
    import os
    os.getcwd()
    os.chdir('home/bin')
    os.system('mkdir today')  # 执行mkdir命令
    注意：一定要使用import os 而不是from os import *，避免内建的
        open()函数被os.open()隐式替换掉。
    对于平时文件和目录管理任务，shutil模块提供了更易于使用的更高级别的接口。
    import shutil
    shutil.copyfile('data.db','archive.db')
    shutil.move('buil/executables','installdir')
2.文件通配符glob模块，提供了在目录中使用通配符搜索创建文件列表的函数
    import glob
    glob.glob('*.py')  # 搜索出所有py文件
3.命令行参数sys，通用实用程序脚本通常需要处理命令行参数。这些参数作为列表存储在sys模块的argv属性中。命令行运行python demo.py one two时
    import sys
    print(sys.argv)
    输出 demo.py one two
argparse模块提供了一种处理命令行参数的机制。它应该总是优先于直接手工处理的sys.argv.
    import argparse
    from getpass import getuser
    parser = argparse.ArgumentParser(description='An example')
    parser.add_argument('name',nargs='?',default=getuser(),help='The name of someone to greet')
    parser.add_argument('--verbose','-v',action='count')
    args = parser.parser_args()
    greeting = ['Hi','Hello','Greetings! its wonderful'][args.verbos % 3]
    print(f'{greeting},{args.name}')
    if not args.verbose:
        print('Try running this again with multiple "-v" flags!')
    sys模块还有stdin,stdout,stderr的属性，后者对于发出警告和错误消息非常有用，即使在stdout被重定向后也可以看到
    sys.stderr.write('Warning,log file not found starting\n')
4.字符串模式匹配re，正则表达式
    import re
    re.findall(r'\bf[a-z]*','which foot or hand fell fastest')
    re.sub(r'(\b[a-z]+) \1',r'\1','cat in the hat')
    当只需要简单的功能时，首选字符串方法，因为容易调试和阅读
    'tea for too'.replace('too','two')
5.数学
math模块提供了对浮点数数学的底层C库函数的访问：
    import math
    math.cos(math.pi / 4)
    math.log(1024,2)
random模块提供了随机选择的工具
    import random
    random.choice(['apple','pear','banana'])
    random.sample(range(100),10)  # 随机选10个数当样本
    random.random()  # 0到1随机一个float数
    random.randrange(6)
statistics模块，计算数值数据的基本统计属性（均值，中位数，方差）
import statistics
data = [2.75,1.75,1.25,0.25,.0.5,1.25]
statistics.mean(data)  # 平均值
statistics.median(data)  # 中位数
statistics.variance(data)  # 方差
SciPy项目<https://scipy.org>有许多其他模块用于数值计算
6.互联网访问
urllib.request用于从URL检索数据，smtplib用于发送邮件
from urllib.request import urlopen
with urlopen('http://tycho.usno.navy.mil') as response:
    for line in response:
        line = line.decode('utf-8')
        if 'EST' in line or 'EDT' in line:
            print(line)  # 获取西方时间
import smtplib
server = smtplib.SMTP('localhost')  # 邮件服务器运行地
server.sendmail('bayhax@163.com','whlbayahx@163.com',
"""To:whlbayahx@163.com
From:bayhax@163.com
fd
""")
server.quit()
7.日期和时间
datetime 提供了简单和复杂的方式操作日期和时间的类。虽然支持时间和日期算法，但实现的重点是有效的成员提取进行输出格式化操作。
    from datetime import date
    now = date.today()
    now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B")
    birthday = date(1949,10,1)
    age = now - birthday
    age.days
8.数据压缩 zlib gzip bz2 lzma zipfile tarfile
    import zlib
    s = b'witch which has which witchs wrist watch'
    len(s)
    t = zlib.compress(s)
    len(t)
    zlib.decompress(t)
    zlib.crc32(s)
9.性能测试timeit
    from timeit import Timer
    Timer('t=a;a=b;b=t','a=1;b=2').timeit()
    Timer('a,b=b,a','a=1,b=2').timeit()
    与timeit的精细粒度级别相反，profile和pstats模块提供了用于在较大代码块中识别时间关键部分的工具。
10.质量控制
开发高质量软件的一种方法是在开发过程中为每个函数编写测试，并在开发过程中经常运行这些测试。
doctest模块提供了一个工具，用于扫描模块并验证程序文档字符串中嵌入的测试。测试构造就像将典型调用及其结果剪切并粘贴到文档字符串一样简单。这通过向用户提供示例来改进文档，并且允许doctest模块确保代码保持对文档的真实。
    def average(values):
        """Computes the arithmetic mean of a list of numbers.
        \>>> print(average([10,30,70]))
        40.0
        """
        return sum(values) / len(values)
    import doctest
    doctest.testmod()
unittest模块不像doctest模块那样易于使用，但是它允许在已个单独的文件中维护更全面的测试集
    import unittest
    class TestStatisticalFunctions(unittest.TestCase):
        def test_average(self):
            self.assertEqual(average([20,30,70]),40.0)
            self.assertEqual(round(average([1,5,7]),1),4.3)
            with self.assertRaise(ZeroDivisionError):
                average([])
            with self.assertRaise(TypeError):
                average(20,30,70)
    unittest.main()
11.自带电池
    Python有“自带电池”的理念。通过其包的复杂和强大功能可以最好地看到这一点。例如:
    xmlrpc.client 和 xmlrpc.server 模块使得实现远程过程调用变得小菜一碟。尽管存在于模块名称中，但不需要直接了解或处理XML。
    email 包是一个用于管理电子邮件的库，包括MIME和其他符合 RFC 2822 规范的邮件文档。与 smtplib 和 poplib 不同（它们实际上做的是发送和接收消息），电子邮件包提供完整的工具集，用于构建或解码复杂的消息结构（包括附件）以及实现互联网编码和标头协议。
    json 包为解析这种流行的数据交换格式提供了强大的支持。 csv 模块支持以逗号分隔值格式直接读取和写入文件，这种格式通常为数据库和电子表格所支持。 XML 处理由 xml.etree.ElementTree ， xml.dom 和 xml.sax 包支持。这些模块和软件包共同大大简化了 Python 应用程序和其他工具之间的数据交换。
    sqlite3 模块是 SQLite 数据库库的包装器，提供了一个可以使用稍微非标准的 SQL 语法更新和访问的持久数据库。
    国际化由许多模块支持，包括 gettext ， locale ，以及 codecs 包。
专业编程需要的包，很少用在脚本中，上面多数用在脚本中。
12.格式化输出
reprlib提供了一个定制化版本的repr()函数，用于缩略显示大型或深层嵌套的容器对象。
    import reprlib
    reprlib.repr(set('supercalifragilisticexpialidocious'))
pprint模块提供了更加复杂的打印控制，其输出的内置对象和用户自定义对象能够被解释器直接读取。当输出结果过长而需要折行时，’美化输出机制’会添加换行符和缩进，以更清楚展示数据结构。
    import pprint
    t = [[[['black','cyan'],'white',['green','red']],[['magenta','yellow'],'blue']]]
    pprint.pprint(t, width=30)
textwrap模块能够格式化文本段落，以适应屏幕宽度
    import textwrap
    doc = """The wrap() method is just like fill() expect that         it returns a list of strings instead of one big          string with newlines to separate the wrapped lines."""
    print(textwrap.fill(doc, width=40))
locale模块处理与特定地域文化相关的数据格式，locale模块的format函数包含一个grouping属性，可以直接将数字格式化为带有组分隔符的样式：
    import locale
    locale.setlocale(locale.LC_ALL,'English_United States.1252')
    conv = locale.localeconv()
    x = 1234567.8
    locale.format("%d", x, grouping=True)
    locale.format_string("%s%.*f",(conv['currency_symbo'],conv['frac_digits'],x),grouping=True)
13.模板
string模块包含一个通用的Template类，具有适用于最终用户的简化语法。允许用户在不更改应用逻辑的情况下定制自己的应用。
上述格式化操作是通过占位符实现的，占位符由$加上合法的Python标识符构成。一旦使用花括号将占位符括起来，就可以在后面直接跟上更多的字母和数字而无需空格分割。$$将被转义成单个字符$
    from string import Template
    t = Template('${village}folkk send $$110 to $cause')
    t.substitute(village='Nottingham',cause='the dicth fund')
如果在字典或关键字参数中未提供某个占位符的值，那么substitute()方法将抛出KeyError。对于邮件合并类型的应用，用户可提供的数据有可能是不完整的，此时使用safe_substitute()方法更加合适--如果数据丢失，它会直接将占位符原样保留
    t = Template('Return the $item to $owner.')
    d = dict(item='unladen swallow')
    t.substitute(d)
    t.safe_substitute(d)
Template的子类可以自定义定界符。例如
    照片浏览器的批量重命名功能
    import time, os.path
    photofiles = ['img_1074.jpg','img_1076.jpg','img_1077.jpg']
    class BatchRename(Template):
        delimiter = '%'
    fmt = input('Enter rename style (%d-date %n-seqnum %f-format): ')
    t = BatchRename(fmt)
    date = time.strftime('%d%b%y')
    for i,filename in enumerate(phototfiles):
        base,ext = os.path.splitext"(filename)
        newname = t.substitute(d=date,n=i,f=ext)
        print('{0} --> {1}'.format(filename, newname))
13.二进制数据记录格式
struct模块提供了pack()和unpack()函数，用于处理不定长度的二进制记录格式。Pack代码'H','I'分别代表了两字节和四字节无符号证书。"<"代表他们是标准尺寸的小尾型字节序：
    import struct
    with open('myfile.zip','rb') as f:
        data = f.read()
    start = 0
    for i in range(3):
        start += 14
        fields = struct.unpack('<IIIHH',data[start:start+16])
        crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
        start += 16
        filename = data[start:start+filenamesize]
        start += filenamesize
        extra = data[start:start+extra+size]
        print(filename, hex(crc32), comp_size, uncomp_size)
        start += extra_size + comp_size
14.多线程
线程是一种对于非顺序依赖的多个任务进行解耦的技术。多线程可以提高应用的响应效率，当接收用户输入的同事，保持其他任务在后台运行。例如将I/O和计算应用在两个并行的线程中。
    import threading, zipfile
    class AsyncZip(thread.Thread):
        def __init__(self, infile, outfile):
            threading.Thread.__init__(self)
            self.infile = infile
            self.outfile = outfile
        def run(self):
            f = zipfile.ZipFile(self.outfile,'w',zipfile.ZIP_DEFLATED)
            f.write(self.infile)
            f.close()
            print('Finished background zip of:',self.infile)
    background = AsyncZip('mydata.txt','myarchive.zip')
    background.start()
    print('The main program continues to run in foreground')
    background.join()
    print('Main program waited until background was done.')
多线程应用面临的主要挑战是，相互协调的多个线程之间需要共享数据或其他资源。为此，threading 模块提供了多个同步操作原语，包括线程锁、事件、条件变量和信号量。
尽管这些工具非常强大，但微小的设计错误却可以导致一些难以复现的问题。因此，实现多任务协作的首选方法是将对资源的所有请求集中到一个线程中，然后使用 queue 模块向该线程供应来自其他线程的请求。应用程序使用 Queue 对象进行线程间通信和协调，更易于设计，更易读，更可靠。
15.日志记录
logging模块提供功能齐全且灵活的日志记录系统。
import logging
logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')
默认情况下，informational 和 debugging 消息被压制，输出会发送到标准错误流。其他输出选项包括将消息转发到电子邮件，数据报，套接字或 HTTP 服务器。新的过滤器可以根据消息优先级选择不同的路由方式：DEBUG，INFO，WARNING，ERROR，和 CRITICAL。
日志系统可以直接从 Python 配置，也可以从用户配置文件加载，以便自定义日志记录而无需更改应用程序。
16.弱引用
Python 会自动进行内存管理（对大多数对象进行引用计数并使用 garbage collection 来清除循环引用）。 当某个对象的最后一个引用被移除后不久就会释放其所占用的内存。
此方式对大多数应用来说都适用，但偶尔也必须在对象持续被其他对象所使用时跟踪它们。 不幸的是，跟踪它们将创建一个会令其永久化的引用。 weakref 模块提供的工具可以不必创建引用就能跟踪对象。 当对象不再需要时，它将自动从一个弱引用表中被移除，并为弱引用对象触发一个回调。 典型应用包括对创建开销较大的对象进行缓存:
    import weakref, gc
    class A:
        def __init__(self, value):
            self.value = value
        def __repr__(self):
            return str(self.value)
    a = A(10)
    d = weakref.WeakValueDictionary()
    d['primary'] = a
    d['primary']
    del a
    gc.collect()
    d['primary']
17.用于操作列表的工具
array模块提供了一种array()对象，类似于列表，但只能存储类型一致的数据且存储密集更高。
    from array import array
    a = array('H', [4000,10,700,20000])
    sum(a)
    a[1:3]
collections模块提供了deque()对象，类似于列表，但从左端添加和弹出的速度较快，而在中间查找的速度较慢。适用于实现队列和广度优先树搜索。
    from collections import deque
    d = deque(['task1','task2','task3'])
    d.append('task4')
    print('Handling',d.popleft())
    unsearched = deque([starting_node])
    def breadth_first_search(unsearched):
        node = unsearched.popleft()
        for m in gen_moves(node):
            if is_goal(m):
                return m
            unsearched.append(m)
在替代的列表实现以外，标准库也提供了其他工具，例如bisect模块具有用于操作排序列表的函数：
    import bisect
    scores = [(100,'perl'),(200,'tcl'),(400,'lua'),(500,'python')]
    bisect.insort(scores, (300,'ruby'))
heapq模块提供了基于常规列表来实现堆的函数。最小值的条目总是保持在位置零。这对于需要重复访问最小元素而不希望运行完整列表排序的应用来说非常有用。
from heapq import heapify, heappop, heappush
data = [1,3,5,6,7,9,2,4,6,8,0]
heapify(data)
heappush(data, -5)
[heappop(data) for i in range(3)]
18.十进制浮点运算
decimal模块提供了Decimal数据类型用于十进制浮点运算。相比内置的float二进制浮点实现。该类特别适用于：
    财务应用和其他需要精确十进制表示的用途，
    控制精度
    控制四舍五入以满足法律或监管要求
    跟踪有效小数位
    用户期望结果与手工完成的计算相匹配的应用程序
    from decimal import *
    round(Decimal('0.70') * Decimal('1.05'), 2)
Decimal表示的结果会保留尾部的零，并根据具有两个有效位的被乘数自动推出四个有效位。Decimal可以模拟手工运算来避免当二进制浮点数无法精确表示十进制数时会导致的问题。
精确表示特性使得Decimal类能够执行对于二进制浮点数来说不适用的模运算和相等性检测
    Decimal('1.00') % Decimal('.10')  # Decimal('0.00')
    1.00 % 0.10   # 两个结果不一样  0.0999999999995
    sum([Decimal('0.1')]*10) == Decimal('1.0')  # True
    sum([0.1]*10) == 1.0  # False
decimal模块提供了运算所需要的足够精度
getcontext().prec = 36
Decimal(1) / Decimal(7)

Python内建数据类型有哪些？

数字(number)---不可变
字符串(str)---不可变
元组(tuple)---不可变
列表(list)---可变
字典(dict)---可变
集合(set)---可变

简述with方法打开处理文件帮我们做了什么？

f = open("./test.txt", "wb")
try:
    f.write("hello world")
except:
    pass
finally:
    f.close()
打开文件在进行读写的时候可能会出现一些异常状况，如果按照常规的f.open写法，我们需要try-except-finally做异常判断，并且文件最终不管遇到什么情况，都执行finally:f.close(),而with方法就帮我们实现了finally中的f.close()

列出Python中可变数据类型和不可变数据类型
```
可变： list dict set
不可变： int str bool tuple
```

Python获取当前日期？

improt datetime
today = datetime.date.today()
print(today)

统计字符串每个单词出现的次数

import io
import re
class Counter:
    def __init__(self, path):
        """
        :param path: 文件路径
        """
        self.mapping = dict()
        with io.open(path, encoding="utf-8") as f:
            data = f.read()
            words = [s.lower() for s in re.findall("\w+", data)]
            for word in words:
                self.mapping[word] = self.mapping.get(word, 0) + 1
    def most_common(self, n):
        assert n > 0, "n should be large than 0"
        return sorted(self.mapping.items(), key=lambda item: item[1], reverse=True)[:n]
if __name__ == '__main__':
    most_common_5 = Counter("三国演义.txt").most_common(5)
    for item in most_common_5:
        print(item)

用Python删除文件和用Linux命令删除文件方法

python中：
    import os
    if os.path.exists("文件路径"):
        os.remove(文件)
    else:
        print("文件不存在")

    删除文件夹
    import os
    os.rmdir("文件夹路径"")
linux中：
    rm 文件    
    rm -r  文件夹

写一段自定义异常代码

# raise自定义异常
def fn():
    try:
        for i in range(5):
            if i > 2:
                raise Exception("数字大于2了")
    except Exception as ret:
        print(ret)
fn()

举例说明异常模块中try except else finally 的相关意义

try:
    语句块里要处理的动作
except 错误类型 as e：
    出现该错误时要提示的错误信息或者进行处理的相关步骤
except 错误类型 as e:
    同上
else：
    没有错误时执行的语句，有错误时不执行
finally：
    有没有错误最后都要执行的语句块

遇到bug如何处理

可以进行try-except-finally查看错误信息，
然后根据错误信息进行相应处理

如果简单化直观处理没写try-except,print打印语句，
快速定位错误所在地方

阅读全文 »

python面试题

基本问题

python001

python简介

为什么叫Python

Markdown使用

Markdown使用

宗旨

Hello World