Earticle

현재 위치 Home

Multi-armed Bandit Online Learning Based on POMDP in Cognitive Radio

첫 페이지 보기
  • 발행기관
    보안공학연구지원센터(IJSH) 바로가기
  • 간행물
    International Journal of Smart Home 바로가기
  • 통권
    Vol.8 No.3 (2014.05)바로가기
  • 페이지
    pp.151-162
  • 저자
    Juan Zhang, Hesong-Jiang, Hong Jiang, Chunmei Chen
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A230895

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

원문정보

초록

영어
In cognitive radio, most of existing research efforts devoted to spectrum sharing have two weakness as follows. First, they are largely formulated as a Markov decision process (MDP), which requires a complete knowledge of channel. Second, most of the studies are online learning based on perceived channel. To solve the above problems, a new algorithm is proposed in this paper: if the authorized user exists in the current channel, Second user will send conservatively in low rate, or send aggressively. When sending conservatively, the state of the channel is not directly observable, the problem turns out to be Partially Observable Markov Decision Process (POMDP).We first establish the optimal threshold when the channel is known, then consider the optimal transmission when the channel is unknown and model for multi-armed bandit. We get the optimal K-conservative policy through the UCB algorithm and improve the convergence speed by UCB-TUNED algorithm. Simulation and analysis results show that it is the same result of K-conservative policy no matter the multi-armed bandit online learning under not fully known channel or the optimal threshold policy under known channel .At the same time, we improve the convergence speed by UCB-TUNED algorithm.

목차

Abstract
 1. Introduction
 2. The System Model
  2.1. POMDP Model
  2.2. Channel Modeling based on POMDP
 3. The known Channel State of the Optimal Transmission Threshold Strategy
  3.1. K Conservative Strategy Structure Modeling
  3.2. The Challenge of the K Conservative Strategy
  3.3. UCB Algorithm
 4. Simulation Results
  4.1. The off-line Algorithm for Optimal Transmission Threshold Strategy
  4.2. Online Learning Algorithm of K Arm Gambling Machine in the Unknown Channel State
 5. Conclusions
 Acknowledgements
 References

키워드

spectrum sharing multi-armed bandit online learning Partially Observable Markov Decision Process

저자

  • Juan Zhang [ the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China ]
  • Hesong-Jiang [ the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China ]
  • Hong Jiang [ the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China ]
  • Chunmei Chen [ the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China ]

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    보안공학연구지원센터(IJSH) [Science & Engineering Research Support Center, Republic of Korea(IJSH)]
  • 설립연도
    2006
  • 분야
    공학>컴퓨터학
  • 소개
    1. 보안공학에 대한 각종 조사 및 연구 2. 보안공학에 대한 응용기술 연구 및 발표 3. 보안공학에 관한 각종 학술 발표회 및 전시회 개최 4. 보안공학 기술의 상호 협조 및 정보교환 5. 보안공학에 관한 표준화 사업 및 규격의 제정 6. 보안공학에 관한 산학연 협동의 증진 7. 국제적 학술 교류 및 기술 협력 8. 보안공학에 관한 논문지 발간 9. 기타 본 회 목적 달성에 필요한 사업

간행물

  • 간행물명
    International Journal of Smart Home
  • 간기
    격월간
  • pISSN
    1975-4094
  • 수록기간
    2008~2016
  • 십진분류
    KDC 505 DDC 605

이 권호 내 다른 논문 / International Journal of Smart Home Vol.8 No.3

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장