愿你梦里有喝不完的酒,
醒来后能酩酊大醉地过完这一生。
你要照顾好你黑色的头发,
挑剔的胃和爱笑的眼睛。
我已经原谅了从前的自己,就像谅解了一个野心勃勃的傻逼,体恤了一个笨手笨脚的勇士,释怀了一个难以启齿的秘密。
### mongodb
安装这里就不再赘述
安装pymongo:
1. pip install wheel
2. 下载找对应 的 whl 文件
https://pypi.python.org/pypi/pymongo#downloads
3. pip install whl 文件的正确位置
插入数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| import pymongo client = pymongo.MongoClient('localhost',27017) dbname = client['dbname'] tbname = dbname['tbname'] path = 'demo.txt' with open(path,'r') as f: lines = f.readlines() for index,line in enumerate(lines): data = { 'index':index, 'line':line, 'words':len(line.split()) } tbname.insert_one(data)
|
读取数据
1 2 3 4 5 6 7 8 9
| import pymongo client = pymongo.MongoClient('localhost',27017) dbname = client['dbname'] for item in tbname.find(): print(item)
for item in tbname.find({'words':0}): print(item)
|
大数据抓取
实例代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| from bs4 import BeautifulSoup import requests import time import pymongo
client =pymongo.MongoClient('localhost',27017) db58 = client['db58'] tb58 = db58['tb58'] tb58item = db58['tb58item']
def get_links_from(channel,pages): list_view = '{}/pn{}'.format(channel,str(pages)) wb_data = requests.get(list_view) time.sleep(1) soup = BeautifulSoup(wb_data.text,'lxml') if soup.find('td','t'): for link in soup.select('td.t > a.t'): item_link = link.get('href').split('?')[0] tb58.insert_one(item_link) print(item_link) else: pass
def get_all_links_from(channel): for num in range(1,101): get_links_from(channel,num)
import time from page_parsing import tb58item
while True: print(tb58item.find().count()) time.sleep(5)
|