메모리 신뢰성 향상을 위한 운영체제 수준 처리 기법
-
Title
- 메모리 신뢰성 향상을 위한 운영체제 수준 처리 기법
-
Alternative Title
- Don’t Delete, Replicate, or Chase Use What You Can
-
Author(s)
- 백승재; 송유재; 최우열; 조상연
- KIOST Author(s)
- Baek, Seung Jae(백승재)
-
Alternative Author(s)
- 백승재; 송유재
-
Publication Year
- 2017-02-15
-
Abstract
- As the number of cores on-chip is scaling, simultaneous execution of multiple threads or applications increases the demand for main memory. Consequently, a large main memory capacity capable of holding the working set of concurrently executing threads is needed. Unfortunately, a number of recent studies indicate that DRAM is among the leading causes for hardware crashes in data centers [1], [2]. As a matter of fact, the scalability of DRAM into small feature size made it vulnerable to frequent errors. While errors can be masked through error correction codes, patterns of errors beyond the capability of those codes eventually form. Hence, failures are inevitable. To counter failures attributed to hard faults, the common practice followed by system designers is to retire physical pages where hard faults reside [3], [4]. Although such a practice is effective in isolating the erroneous effect of hard faults, it is overly aggressive as an entire physical page is wasted for the sake of few faulty bits. In this paper, we propose two software techniques to reuse physical pages with hard faults. The first technique makes the faulty pages available to the slab allocator while the second technique makes the faulty pages available for dynamic user space allocation. Preliminary experimental results show that the proposed techniques can safely run a system whose 12.5% of the physical pages are dead.ing threads is needed. Unfortunately, a number of recent studies indicate that DRAM is among the leading causes for hardware crashes in data centers [1], [2]. As a matter of fact, the scalability of DRAM into small feature size made it vulnerable to frequent errors. While errors can be masked through error correction codes, patterns of errors beyond the capability of those codes eventually form. Hence, failures are inevitable. To counter failures attributed to hard faults, the common practice followed by system designers is to retire physical pages where hard faults reside [3], [4]. Although such a practice is effective in isolating the erroneous effect of hard faults, it is overly aggressive as an entire physical page is wasted for the sake of few faulty bits. In this paper, we propose two software techniques to reuse physical pages with hard faults. The first technique makes the faulty pages available to the slab allocator while the second technique makes the faulty pages available for dynamic user space allocation. Preliminary experimental results show that the proposed techniques can safely run a system whose 12.5% of the physical pages are dead.
-
URI
- https://sciwatch.kiost.ac.kr/handle/2020.kiost/24222
-
Bibliographic Citation
- 한국반도체학술대회, pp.1, 2017
-
Publisher
- 한국반도체학회
-
Type
- Conference
-
Language
- English
- Files in This Item:
-
There are no files associated with this item.
Items in ScienceWatch@KIOST are protected by copyright, with all rights reserved, unless otherwise indicated.