메모리 신뢰성 향상을 위한 운영체제 수준 처리 기법

Title
메모리 신뢰성 향상을 위한 운영체제 수준 처리 기법
Alternative Title
Don’t Delete, Replicate, or Chase Use What You Can
Author(s)
백승재; 송유재; 최우열; 조상연
KIOST Author(s)
Baek, Seung Jae(백승재)
Alternative Author(s)
백승재; 송유재
Publication Year
2017-02-15
Abstract
As the number of cores on-chip is scaling, simultaneous execution of multiple threads or applications increases the demand for main memory. Consequently, a large main memory capacity capable of holding the working set of concurrently executing threads is needed. Unfortunately, a number of recent studies indicate that DRAM is among the leading causes for hardware crashes in data centers [1], [2]. As a matter of fact, the scalability of DRAM into small feature size made it vulnerable to frequent errors. While errors can be masked through error correction codes, patterns of errors beyond the capability of those codes eventually form. Hence, failures are inevitable. To counter failures attributed to hard faults, the common practice followed by system designers is to retire physical pages where hard faults reside [3], [4]. Although such a practice is effective in isolating the erroneous effect of hard faults, it is overly aggressive as an entire physical page is wasted for the sake of few faulty bits. In this paper, we propose two software techniques to reuse physical pages with hard faults. The first technique makes the faulty pages available to the slab allocator while the second technique makes the faulty pages available for dynamic user space allocation. Preliminary experimental results show that the proposed techniques can safely run a system whose 12.5% of the physical pages are dead.ing threads is needed. Unfortunately, a number of recent studies indicate that DRAM is among the leading causes for hardware crashes in data centers [1], [2]. As a matter of fact, the scalability of DRAM into small feature size made it vulnerable to frequent errors. While errors can be masked through error correction codes, patterns of errors beyond the capability of those codes eventually form. Hence, failures are inevitable. To counter failures attributed to hard faults, the common practice followed by system designers is to retire physical pages where hard faults reside [3], [4]. Although such a practice is effective in isolating the erroneous effect of hard faults, it is overly aggressive as an entire physical page is wasted for the sake of few faulty bits. In this paper, we propose two software techniques to reuse physical pages with hard faults. The first technique makes the faulty pages available to the slab allocator while the second technique makes the faulty pages available for dynamic user space allocation. Preliminary experimental results show that the proposed techniques can safely run a system whose 12.5% of the physical pages are dead.
URI
https://sciwatch.kiost.ac.kr/handle/2020.kiost/24222
Bibliographic Citation
한국반도체학술대회, pp.1, 2017
Publisher
한국반도체학회
Type
Conference
Language
English
Files in This Item:
There are no files associated with this item.

qrcode

Items in ScienceWatch@KIOST are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse