Skip to end of metadata
Go to start of metadata

Things may need to improve

  • performance improvement
  • better management UI support:
    • TS-644 http_ui cache lookup interface will crash traffic_server in full cluster mode FIXED
    • when we want to work on the new cli/webui, what we want cover in cluster field?
  • wccp and cluster:
    • How to deal with WCCP switch when we are in full Cluster mode? will wccp benefit from Cluster machine table that already on every machine.
    • the WCCP codes in the tree is not cluster nice, and is unabled to be used in Cluster env.
    • well, who cares how to run TS with WCCP in reverse proxy?
  • management beyond cluster:
    • when Cluster is enabled, the config file will be easy to synced inside the cluster, it is cool for big deployment, is there any issue?
    • How to deal with many clusters? for example do multi cluster config push.
    • how to sync config files for plugins?
  • cluster wide stats:
    • is it working? does it require more improvement? – No, it is broken, completely FIXED after the work on stats from Yunkai
  • cluster & single TCP connection:
    • https://issues.apache.org/jira/browse/TS-1222 single tcp connection will limit the cluster throughput FIXED
      you can now config the cluster threads and internal connection numbers, which will increase performance
    • the network layer is handled by ET_NET thread, and it will make that thread much easy to get overload
    • how many traffic can one tcp handle? with out CPU limit.
    • consider if you have 40 cores, and you just have 10 boxes in the cluster, how can we benefit from that many cores?
    • why should we keep the dedicate ET_CLUSTER, which will make so many content switching between ET_NET and ET_CLUSTER.
    • should we do thread(cpu) full match binding? for example: each ET_NET thread #0, make connected to the corresponding ET_NET#0 on other hosts in the same cluster.
  • when there is some very hot urls, that will destroy the whole cluster: consider when you have 10 boxes, each have 1gps interface, and you have at most 10Gbps traffic, but if one of your hot content bing you more than 1gps traffic, then that box with the content will get all the traffic, and the traffic will make cluster communication unable to handle, and the cluster will take off this box, then another box will need to handle this content, so all traffic will crash it too. that will make things a mess. HOW can we handle this?
    • the mem hit codes in cache part does no help
    • ts cluster hash is unable to make the url local cachable. --FIXED
    • we should setup some cache before it goes to Cluster remote. --FIXED
      we have make one new cache control directive, to make it able to force local cache in Clustering mode.

Arch improve:

  • how to deal with servers with multiple interface/ip adrr? when we have 4+ interfaces, someone may need to setup more link than 1+1.
  • not sure how cluster deal with connections in cluster members, can 1 tcp connection handle 1 gigabits traffic?
  • should we make dedicate networking layer for Cluster? to avoiding the bond with ET_NET.
  • should we consider change the inter cluster protocol? ie using UDP instead of the current TCP, or any other high performance protocols.

things done

  • cluster wide http_ui, or other http_ui like investor:
    • cache/lookup_url cache/lookup_regex does not return cluster wide objects FIXED

refine cluster TS-2005

  • what problem we want to solve?
    • CPU usage
      on our 16 cores system, we find out that ET_CLUSTER cluster threads will use more CPU than ET_NET, we have to increase more cluster threads to avoid the CPU limit of ET_CLUSTER, up to 8-12 threads.
    • throughput per cluster connection
      ech cluster connection can only provide 300mbps traffic, but we need more for high traffic requirement
    • big cluster issue
      we need to handle about 200 hosts per cluster
  • what we do:
    • make cluster a pure message driven layer, no more vc splice on each side
    • cleanup the msg encapsulation and callback implements
    • modified the cache cluster interface
  • what we don't change:
    • the cluster API in most of the high level
    • the cache interface in most of the high view
  • what we have done:
    • most of the cluster message rework
    • better performance
  • what we are still need to do:
    • rework the ramcache, to get ET_CLUSTER able to read ramcache and response directly

documents for refine_cluster tree:

we have make a total refine to the cluster msg system, and so as the critical config intafaces, here is the config and stats information.

the new config in threading and connections

proxy.config.cluster.threads Origin directive, default to 1. better to adjust depends on your device, for example: 4 on a 16cores system.
proxy.config.cluster.connections New directive, default to 0. the default will depends on threads number, at least 1 connection per thread. at least 2 connections for each of host.
proxy.config.cluster.peer_timeout Origin directive, control on how agressive we do when declare some host down. default 30, should be adjusted when you have very big cluster for exmaple 200hosts, your should set it 50.

new TS Cluster configs

  1. session configs
  • proxy.config.cluster.max_sessions_per_machine:max sessions we can hold, depends on QPS you need to handle, better setting to QPS*2
  • proxy.config.cluster.session_locks_per_machine:session locks if it does metter, for example: 673, 1361, 2729, 5471
  • proxy.config.cluster.read_buffer_size:cluster read buffer size, default to 2MB, should be > 64KB and < 2MB
  1. flow control configs
  • proxy.config.cluster.flow_ctrl.min_loop_interval:min loop interval, in us
  • proxy.config.cluster.flow_ctrl.max_loop_interval:max loop interval, in us
  • proxy.config.cluster.flow_ctrl.min_send_wait_time:min time to wait when sending, in us
  • proxy.config.cluster.flow_ctrl.max_send_wait_time:max time to wait when sending, us
  • proxy.config.cluster.flow_ctrl.min_bps:min bps when we will start flow control, lower limit
  • proxy.config.cluster.flow_ctrl.max_bps:max bps when we need to start flow control, uper limit

引入loop_interval是为了降低CPU开销,提高网络单次发送和单次接收的字节数。
如果一轮处理下来,如果耗时(time_used)小于(loop_interval + 100us),则休眠 (loop_interval - time_used)微秒, 否则此轮不需要休眠。

每秒动态计算一次时间,逻辑如下:
1) 如果cluster当前发送流量(bps) < min_bps,或者max_bps为0,则取时间最小值,如:min_loop_interval
注:cluster当前发送流量(bps) 每秒钟生成 / 计算一次
2) 否则,计算公司为:min_time + (max_time - min_time) * (bps / max_bps)
注:bps / max_bps会做归一化处理,即大于1.0的情况下,会调整为1.0

动态计算出来的两个参数:

  • proxy.process.cluster.io.send_wait_time:发送数据等待时间
  • proxy.process.cluster.io.loop_interval:一轮循环需要花费的最小时间
  1. cluster ping参数
  • proxy.config.cluster.ping_send_interval_msecs:多长时间间隔ping一次。单位:ms
  • proxy.config.cluster.ping_latency_threshold_msecs:收到response的最大时长,如果超过这个时间没有收到response,则ping超时。单位:ms
  • proxy.config.cluster.ping_retries:重试次数。如果连续重试次数 + 1次均失败,则认为对端连接已经断开,此时将关闭该socket连接。

stats that will help you understand the cluster internal

Cluster IO统计参数如下:

  • proxy.process.cluster.io.send_msg_count:已发送消息数
  • proxy.process.cluster.io.drop_msg_count:丢弃的消息数。只有当对端连接断掉的时候,才会抛弃掉其未发送消息
  • proxy.process.cluster.io.send_bytes:已发送字节数
  • proxy.process.cluster.io.drop_bytes:丢弃字节数
  • proxy.process.cluster.io.call_writev_count:调用writev次数
  • proxy.process.cluster.io.recv_msg_count:接收消息数
  • proxy.process.cluster.io.recv_bytes:已接收字节数
  • proxy.process.cluster.io.call_read_count:调用read次数
  • proxy.process.cluster.io.send_delayed_time:发送总延时,单位:ns
  • proxy.process.cluster.io.push_msg_count:投递到消息队列的消息数
  • proxy.process.cluster.io.push_msg_bytes:投递到消息队列的字节数

单次发送字节数:send_bytes / call_writev_count
单次接收字节数:recv_bytes / call_read_count
待发送消息数目 = push_msg_count - (send_msg_count + drop_msg_count)
待发送字节数 = push_msg_bytes - (send_bytes + drop_bytes)
发送延迟 = send_delayed_time / send_msg_count

cluster ping统计参数

  • proxy.process.cluster.ping_total_count:总次数
  • proxy.process.cluster.ping_success_count:成功次数
  • proxy.process.cluster.ping_time_used:ping pang耗时,单位:ns

平均ping耗时:ping_time_used / ping_success_count

session 统计

server session

server接收到会话的第一个message时,会自动创建session

  • proxy.process.cluster.server_session.create_total_count:创建总次数
  • proxy.process.cluster.server_session.create_success_count:成功创建次数
  • proxy.process.cluster.server_session.create_retry_times:重试次数,等于create_success_count
  • proxy.process.cluster.server_session.close_total_count:关闭总次数
  • proxy.process.cluster.server_session.close_success_count:成功关闭次数
  • proxy.process.cluster.server_session.miss_count:找不到session次数
  • proxy.process.cluster.server_session.occupied_count:session被别人占用次数
    client session
    client发起请会话前,需要先创建session。
  • proxy.process.cluster.client_session.create_total_count:创建总次数
  • proxy.process.cluster.client_session.create_success_count:成功创建次数
  • proxy.process.cluster.client_session.create_retry_times:重试次数,第一次就创建成功,计数为1
  • proxy.process.cluster.client_session.close_total_count:关闭总次数
  • proxy.process.cluster.client_session.close_success_count:成功关闭次数
  • proxy.process.cluster.client_session.miss_count:找不到session次数
  • proxy.process.cluster.client_session.occupied_count:session被别人占用次数
Labels
  • No labels
  1. Anonymous

    We interested in using ATS in our EC2 environment. I am trying to find information on using 2 ATS servers with a common cache, using our 4 web servers as origin servers, but I can't find any documentation on how to properly set this up. Any help?

      1. Anonymous

        The url is not found....

  2. Anonymous

    how to get the url? I'm eager to get a copy!

  3. Anonymous

    Hello,

    How do i set an virtual (failover) ip for my cluster?

    thanks,

    Martijn