SP-cache: Load-balanced, Reunduncy-free cluster caching with Selective Partition.

SP-cache: Load-balanced, Reunduncy-free cluster caching with Selective Partition.

wei wang. HKUST


Memory trumps:RAM吞吐量指数级的提高。Disk I/O throughput stalls.(disk io 吞吐量停滞。)

Outline of this talk.1. prior talk. SP-caches. performance evaluation. Summary.


replication is too costly in memory.

>1*memory overhead per replica.每个副本都要内存开销。

>hot files large in size.热文件很大。

Erasure coding

(k,n)code: split a file into k chunks and compute n-k partly chunks.(k,n)编码:将文件拆分为k个块并计算n-k个部分块。

Any k of the n chunks are able to decode the original file.n个块中的任何k都能够解码原始文件。

>less memory overhead: (n-k)/k内存开销较小:(n-k)/ k

state-of-the-art goes to EC-cache [Rashm et al. OSDI16]最先进的EC缓存

Migrating hot spot w/ partition.

split files into multiple partitions. cached by servers uniformly at random.将文件拆分为多个分区。由服务器统一随机缓存。

potential benefits of partition.分区的潜在好处。

spread the load of hot files to multiple servers.将热文件负载分散到多个服务器。

No encoding/decoding overhead.没有编码/解码开销。

No redundancy w/ highest memory efficient.无冗余/最高内存效率。

Increasing read parallelism speeds up I/O.增加读取并行性可加快I / O.

However, reading partition from multiple server is susceptible to stragglers.但是,从多个服务器读取分区容易受到散乱者的影响。

How many partitions should a file be split into?

>too few are unable to migrate the load of hot spots.太少的人无法迁移热点的负荷。

>too many are susceptible to stragglers.太多容易受到散乱者的影响。

Intuition:selectively split hot files.直觉:有选择地分割热文件。

selective partition

split files into partitions based on their (expected) loads.根据预期负载将文件拆分。

​.其中​ 代表分区。a是scale factor。​ Is load.


均衡的每分区负载Equalized per-partition load:​。

定理:compared to EC-cache with (k,n)coding scheme. SP-cache improves load balancing by O(​) in a large cluster.


Determining scale factor.确定比例因子

model selective partition as a fork-join queue.将选择性分区模型化为fork-join队列。

Upper-bound analysis for the mean read latency ​.

Just-enough # of partitions to achieve load balancing.

periodic balancing w/ re-partitioning.

Master launcher multiple clients to re-partition files in parallel.

File popularity follows Zipf distribution w/ exponent parameter 1.05.(high skewness)

Two baseline w/ the same memory overhead.

EC-cache:coding scheme

Selective replication:copy to 10% popular files to 4 replica.

Read latency w/ stragglers.

manually inject stragglers.

bing cluster trace.

resilient to stragglers.

up to 40% improvement in latency.

Revisiting this talk

Load-balancing cluster caches with selective partition

split files into partition based on their popularity.


no encoding/decoding overhead

split files into just enough# of partition for load balancing.

0 个评论