Earticle

현재 위치 Home

한국차세대컴퓨팅학회 학술대회

간행물 정보
  • 자료유형
    학술대회
  • 발행기관
    한국차세대컴퓨팅학회 [Korean Institute of Next Generation Computing]
  • 간기
    반년간
  • 수록기간
    2021 ~ 2025
  • 주제분류
    공학 > 컴퓨터학
  • 십진분류
    KDC 566 DDC 004
ICNGC 2025 The 11th International Conference on Next Generation Computing 2025 (92건)
No

Oral Session A-1: Computer Vision

2

Artificial General Intelligence (AGI) introduces a new class of ethical and technical challenges because it is expected to operate with autonomous goal formation, extended temporal reasoning, and reflective metacognition that go far beyond the constraints of current narrow AI systems. These capabilities imply that ethical safeguards cannot remain external layers or post-hoc filters; instead, they must function as internal cognitive components embedded within the AGI’s core architecture. To address this need, this paper proposes a Modular Ethical AGI Framework composed of three foundational subsystems: a Hybrid Alignment Stack that unifies top-down normative principles with bottom-up, data-driven moral priors; a Moral Reflection Module capable of contextual ethical assessment, symbolic interpretation, and counterfactual reasoning; and a Metacognitive Consistency Layer that performs coherence evaluation, reflective self-correction, and justification generation. To operationalize these subsystems, we introduce an Ethical Deliberation Cycle, which provides a structured sequence for moral feature extraction, normative activation, action evaluation, conflict resolution, reflective consistency checking, and explanation generation. This framework directly addresses limitations widely observed in current alignment research, including rule brittleness [1], lack of contextual nuance [2], dataset bias [3], and absence of principled coherence mechanisms [4]. It further identifies potential failure modes such as value–rule conflicts, cultural narrowness in moral datasets, symbolic grounding gaps, and metacognitive overconfidence. We argue that ethical reasoning is not an optional enhancement but a structural necessity for AGI safety, and that the proposed modular architecture offers a viable starting point for designing trustworthy and value-aligned autonomous intelligence.

3

High-quality annotations are crucial for accurate object detection, but widely used datasets like MS-COCO face issues such as missing objects, duplicate labels, and inaccurate bounding boxes. To overcome these problems, MJ-COCO was created through model-driven refinement, increasing annotations from 860,001 to 1,221,970 instances. This paper presents a comparative analysis of MS-COCO and MJCOCO, with a focus on the accuracy of bounding box measurements. We designed a human-in-the-loop evaluation framework with custom software that enables side-by-side visualization of annotations, allowing evaluators to classify outcomes as improved, worse, or ambiguous. We collectively evaluated 41,754 annotations through a human-in-the-loop verification process involving fifteen human evaluators. The results demonstrate that a total of 25,754 annotations were improved, 2,398 were worsened, and 13,623 were ambiguous, for a total quality score of 89.49%. These findings show that MJ-COCO considerably enhances annotation quality and precision over MS-COCO, making it a more consistent and accurate standard for advancing object detection studies. The dataset and software codes are publicly available on Kaggle: https://www.kaggle.com/datasets/mjcoco2025/mj-coco-2025.

4

Object detection (OD) is a fundamental task in computer vision. However, progress is often hindered by limitations in existing datasets, including human annotation errors, reliance on manual annotation, missing annotations due to occlusion, and domain specificity. To address these challenges, this work proposes an automatically generated synthetic single-view dataset for OD. The dataset was generated in Unity by constructing a 3D virtual city with a single-camera surveillance system, providing diverse perspectives and calibrated viewpoints. Object metadata, including position and dimensions, was automatically extracted and projected into the 2D image plane to generate accurate bounding boxes. Annotations were normalized into YOLO format, with invalid boxes removed, resulting in a single-view dataset that is consistent, precise, and free from manual labeling errors, while still reflecting real-world challenges such as occlusion and object variation. Two versions of the dataset, original and refined, were created to evaluate the effect of bounding box quality on detection performance. An experimental evaluation using the YOLOv11 model demonstrated that the proposed dataset substantially improved detection performance, yielding notable gains in precision, recall, and mean average precision (mAP). These results underscore the importance of accurate dataset curation and highlight the potential of synthetic datasets to advance single-view OD in applications such as surveillance, autonomous systems, and robotics.

5

The accumulation of dust on the solar panels causes a significant decrease in the efficiency of the panels, particularly in dry and semi-arid climates, where the energy yield is affected. This paper discusses how the VGG16 DL model can be applied to the real-time detection of dust that covers the solar panel to enhance their maintenance and improve energy efficiency. Using the VGG16, pre-trained on large image datasets, fine-tune this model to classify between clean and dusty solar panels. The model is thus trained on a holistic dataset of solar panels with variation under different environmental conditions. This approach minimizes energy loss, reduces keep costs, and enhances and overall performance and lifespan of solar panels. The method holds considerable promise for solar farms to optimize cleaning schedules and maximize energy production, promoting more sustainable solar energy solutions.

6

Pothole detection remains a critical challenge in road maintenance and safety management, as potholes deteriorate road surfaces, compromise vehicle safety, and increase maintenance costs. Traditional pothole detection methods relying on manual inspection or simple image processing are often labor-intensive, prone to human error, and lack adaptability to varying road conditions. Meanwhile, modern approaches utilizing single-stage object detectors such as YOLO variants have provided real-time detection capabilities but tend to suffer in accurately localizing potholes at higher Intersection over Union (IoU) thresholds, especially when faced with the irregular shapes and scale variability characteristic of real-world potholes. To overcome these limitations, a multi-stage detection framework based on Cascade Region-based Convolutional Neural Network (Cascade R-CNN) with a ResNet-50 backbone and a Feature Pyramid Network (FPN) was developed. This framework employs progressive bounding box refinement through multiple detection stages with increasingly strict IoU thresholds, resulting in improved localization precision. The model was trained and evaluated on a meticulously curated dataset of more than 30,000 images featuring diverse pothole instances. It achieves a mean Average Precision (mAP) of 0.653 across IoU thresholds from 0.5 to 0.95, surpassing the baseline Faster RCNN by 4.3 points and outperforming YOLOv8 by 5 points. On an NVIDIA RTX 4090 GPU, the proposed model runs at approximately 80–90 frames per second, which enables nearreal- time execution and renders it practical for integration into automated road inspection and maintenance systems. These results indicate that the proposed Cascade R-CNN framework offers a robust and effective solution for high-accuracy pothole detection, addressing the shortcomings of existing detection methods in complex road environments.

7

A specific area of research in artificial intelligence, known as deep learning (DL) has turned into a strong source for the solutions of complicated issues in computer vision and many more. A real application of that is real and fake faces detection. Detection of is real and fake faces became increasingly important these days with increasing deepfake technology. Fake images pose great dangers to information security, trustworthiness in multimedia content, and even to society's stability. In his proposal the design of a deep learning-based model in VGG-16 architecture to make high accuracy and reliability distinctions between real and fake faces. The performance of the proposed model was evaluated by a number of metrics, including accuracy, specificity, recall, precision, and misclassification rate. The results showed that the model obtained an excellent accuracy of 99.61% with a very low misclassification rate of 0.39%. It obtained perfect specificity of 99.15%, which means all fake faces were identified correctly, and a precision value of 99.29%, ensuring that all faces classified as real were indeed real. The recall of the model was high, at 100%, meaning nearly all real faces were correctly identified. The obtained results are the proof of how effective DL and, in this case, using a pre-trained model like VGG-16, is at recognizing real and fake faces. It shows how strong and reliable the proposed.

Oral Session B-1: Vision Applications

8

The phenotypic characteristics of plants, including their length and width, are key indicators for evaluating growth status. In this study, we propose a robust framework for radish phenotype evaluation based on an improved SOLOv2 instance segmentation algorithm and a dataset of 1100 annotated images. The enhanced model enables precise segmentation of radish components, facilitating accurate measurement of leaf and root size. Furthermore, we integrate a Channel–Spatial Attention Module (CSAM) into the feature extraction stage to optimize the backbone, and incorporate soft attention mechanisms into the Feature Pyramid Network (FPN) to enhance its representation capability. Experimental evaluations show that the improved SOLOv2 model achieves an average segmentation accuracy of 94.3%. The proposed system significantly reduces the labor and time required by traditional measurement methods.

9

We introduce a reference-guided, fully automatic mask generation framework that does not rely on textual prompts or manual annotations. The approach first uses Segment Anything Model (SAM) with automatic mask generation (AMG) to produce multiple mask candidates. Each candidate is then scored against the reference image in the CLIP semantic space. A robust Top-K selection with prior reweighting favors plausible regions and suppresses small, off-center, or abnormal aspect-ratio masks. Finally, morphological closing and Gaussian feathering yield refined hard/soft masks that can be directly consumed by inpainting or blending modules. Experiments on a COCO subset and our in-house images show strong performance on segmentation metrics (IoU, Dice) and perceptual measures (FID, LPIPS, CLIP-Score), while avoiding the cost of manual masks. This enables streamlined asset preparation for metaverse content creation, immersive AR/VR scenes, and large-scale digital twins where zero-interaction mask generation is crucial

10

This study will concentrate on developing an automated process for creating a 3D environment utilizing satellite imagery, a segmentation algorithm, and geospatial data. Traditional methods for crafting a 3D environment primarily rely on manually sculpting terrain and generating 3D objects, which requires substantial time, effort, and resources from the developer. We aim to introduce a system that combines satellite images, digital terrain models, and building segmentation through Python programming to create 3D environments in Unreal Engine. The implementation includes a Python Tkinter GUI for data collection and preprocessing, Mask-RCNN for building segmentation, and the use of Open Street Map (OSM) data to utilize data availability and visualization of data. The system will be evaluated by generating 3D scene environments using satellite image input and incorporating geospatial datasets to analyze and measure the visual similarities between actual and generated 3D environments.

11

Precision agriculture increasingly relies on advanced technologies to enhance sustainability and productivity. Among these, deep learning and machine learning play a critical role in developing automated systems capable of accurately identifying plant diseases. This study presents a comparative analysis of various deep learning models for plant disease classification. Specifically, we employ transfer learning using pre-trained architectures such as VGG16, ResNet-50, DenseNet-121, and EfficientNet-B0, alongside a custom convolutional neural network (CNN) trained from scratch. The models are evaluated using a dataset containing images of both healthy and diseased plants. Experimental results indicate that transfer learning models outperform the custom CNN, with DenseNet-121 and EfficientNet-B0 offering the optimal balance between computational efficiency and classification accuracy. These findings underscore the potential of deep learning techniques to support precision agriculture by enabling faster, more accurate, and scalable disease detection—reducing the reliance on manual inspection and facilitating timely interventions.

12

Three-dimensional (3D) point clouds provide detailed geometric understanding of real-world environments but remain challenging to process due to their sparse and unordered nature. Contrastive learning has emerged as a powerful self-supervised approach for learning representations from unlabeled 3D point cloud data. At the core of these methods lie encoder architectures that project raw points into discriminative latent spaces. This brief survey highlights major encoder families used in 3D contrastive learning and analyzes their design principles, strengths, and limitations. We further discuss how encoder choice influences downstream performance and outline research trends toward efficient, multimodal, and real-time contrastive frameworks.

13

Cone-Beam Computed Tomography (CBCT) plays a central role in Image-Guided Radiation Therapy (IGRT), but its relatively long acquisition time often leads to motion artifacts that reduce diagnostic quality. This work presents a framework for artifact correction based on residual learning within a conditional Denoising Diffusion Probabilistic Model (DDPM). In this setting, the model learns to predict the residual artifact component instead of the entire CT image. To encourage stable learning, a hybrid loss function incorporating L1 regularization on the predicted residual is introduced. The L1 term is intended to promote sparsity, guiding the model to focus on localized artifact regions while maintaining robustness against anatomical inconsistencies between CBCT and CT pairs. Experiments on paired CBCT-CT datasets showed improved quantitative and perceptual results compared to baseline diffusion and residual models, suggesting that the sparsity constraint may contribute to more reliable artifact suppression.

Oral Session A-2 : Language Processing

14

The proliferation of multimodal systems demands efficient management of heterogeneous computing resources. However, most GPU-centric frameworks still rely on static scheduling, resulting in unbalanced utilization and energy waste. This paper presents HERMES (Heterogeneous Efficient Resource Management and Execution Scheduling), an adaptive scheduling framework designed for efficient scheduling in heterogeneous multimodal AI systems. HERMES introduces HScore, a unified metric that quantifies heterogeneous efficiency by integrating performance (FPS) and power consumption. Experimental results on a ViT-based multimodal benchmark show that HERMES achieves up to 12.7% faster execution and 15.8% higher energy efficiency than static hybrid baselines, while maintaining balanced CPU–GPU utilization. These findings confirm that adaptive feedback scheduling significantly enhances both scalability and sustainability in multimodal AI systems.

15

We investigate whether Large Language Model (LLM)s can learn strategic reasoning and social deception abilities through Reinforcement Learning (RL) finetuning via a multi-agent “Mafia Game” simulation environment. We finetune a baseline 7B model using Proximal Policy Optimization (PPO) with sparse binary rewards based on game outcomes. Training samples are collected through an opponent pool consisting of different versions of the finetuned model. Our experiment results show that the finetuned model outperforms the baseline model by a significant margin and suggest that strategic capabilities unseen in baseline models emerge.

16

The exponential growth of digitally produced content has necessitated the advent of smart, automated systems that can generate quality, search-optimized materials. IntelliWriter.io unveils a multi-model AI architecture that harmonizes transformer-based language models for SEO-centric content creation, improvement, and dissemination. In contrast to standard text generators, IntelliWriter employs domain-specific fine-tuning, keyword clustering, and contextual weighting to deliver both relevance and readability. By its being integrated with such platforms as WordPress, Shopify, and Wix, it facilitates the seamless auto-publishing and real-time metadata optimization. Experimental assessment suggests that IntelliWriter is cutting the editing time by 62% and improving the SEO ranking performance by 38% which makes it a next-generation framework for intelligent content automation.

17

Artificial intelligence (AI) is a significant tool in modern military operations in that it helps to analyze a large volume of strategic, tactical, and operational data. On the other hand, current large language models (LLMs) like GPT-4 or Falcon have difficulty resolving problems in defense-specific contexts because of issues related to security, data confidentiality, and the lack of explainability. This document presents MilGPT, a secure and explainable LLM structure that aims at solving military problems only. To the model, fine-tuned open-source architectures with domain-specific defense datasets are integrated to elevate intelligence synthesis, decision-making, and threat prediction. On the benchmark, performance evaluation tasks show that MilGPT accounts for a 27% increase in contextual accuracy, an 18% reduction in hallucination rate, and an 33% improvement in explainability as measured by gradient-based feature attribution. In the proposed framework, military intelligence systems are not only secured but also made adaptive and humaninterpretable, thus, setting up a basis for the coming generation of AI models capable of defense-grade tasks.

18

This study explores university students’ perceptions, willingness, and concerns regarding the use of Generative Artificial Intelligence (GenAI) technologies—such as ChatGPT—in higher education across India. A survey design involving 1,197 undergraduate and postgraduate students from diverse disciplines was employed to assess their familiarity, attitudes, and expectations toward GenAI. Findings reveal that most students possess a strong understanding of GenAI’s capabilities and limitations, recognizing its potential to enhance personalized learning, research efficiency, and writing support. Students appreciated GenAI’s accessibility, time-saving features, and ability to provide 24/7 assistance, aligning with previous studies (Atalas, 2023; Berg, 2023). However, notable concerns emerged regarding the reliability, transparency, privacy, and ethical implications of AI use, echoing issues raised by Peres et al. (2023). Participants also expressed apprehension about over-reliance, diminished creativity, reduced social interaction, and future job insecurity (Ghotbi et al., 2022). Overall, the findings highlight the need for responsible GenAI integration in education through enhanced AI literacy, ethical guidelines, and adaptive pedagogical strategies (Biggs, 2011). By addressing students’ diverse perspectives, institutions can leverage GenAI to improve teaching, learning, and preparation for an AI-driven future.

19

Among individuals who have difficulty phonating due to laryngectomy or voice disorders, the need for silentspeech- based communication technologies is steadily increasing. Recent studies reconstruct acoustic speech from silent speech by extracting audio features from electromyography (EMG) signals with a transduction model, aligning these features with those from phonated speech, and decoding the aligned representations. Speech generated by this approach typically contains substantial noise and exhibits weak articulation and indistinct phonation. In addition, because speaker-specific voice information is modeled as a whole rather than disentangled, personalized adaptation is difficult. To improve the naturalness and articulation of synthesized speech, we adopt Diff-HierVC, a diffusion-based hierarchical voice conversion architecture, and modify the original design, which predicted targets using only phonated speech, so that target acoustic representations are predicted from EMG signals. We train the model with three disentangled features: content (w2v), mel-spectrogram, and pitch (f0), enabling voice conversion for silent speech. We also compare it with a baseline model that does not use Diff-HierVC in a listening test. The results show that the proposed model significantly improves perceived speech naturalness over the baseline.

Oral Session B-2 : Mobile & Communication

20

This paper focuses on the specific gaming scenario of Fruit Ninja and presents the design and implementation of an automated visual recognition and path control system for realtime fruit detection and automatic slicing. The system employs the YOLOv11s model trained on a publicly available Fruit Ninja screenshot dataset to achieve real-time detection of fruits and bombs. Building upon an open-source automated fruit-cutting project, this work introduces a lightweight path optimization module—DAFCS (Danger-Aware Fruit Cutting Strategy)— which dynamically generates safe and efficient slicing paths based on bomb locations and fruit distances. The overall system comprises object detection module, traking module, path planning module, mouse control for slicing execution, and hit evaluation module. Experimental results demonstrate that the DAFCS strategy, powered by YOLOv11s, significantly improves fruit hit rate and path efficiency compared to traditional sequential strategies using YOLOv8, while maintaining acceptable response speed. This system illustrates the practical value of integrating object detection and trajectory control techniques in interactive gaming scenarios and provides a valuable reference for future research on automated control in similar contexts.

21

This paper investigates deep learning-based SNR estimation for OFDM systems. A lightweight ResNet-inspired model is applied to estimate SNR under AWGN, Rayleigh, and Rician channels. Specifically, our model consists of two residual blocks to ensure a lightweight design. The dataset includes wide SNR ranges with realistic impairments such as fading and frequency offsets. Performance is evaluated using mean square error (MSE) and mean absolute error (MAE). Results show stable estimation across all channels with low error values in the low SNR regions.

22

This study develops a systematic, verifiable experimental study to clarify when and how transforming tabular data into graph structures enhances node-level classification: AI-generated synthetic dataset with controlled numbers of nodes, classes, and imbalance; two interpretable graphs are constructed (1) a similarity-based k-nearest neighbors graph with the number of neighbors and model depth varied, and (2) a rule-based graph with explicit, transparent connection rules; and graph-based methods GCN, GraphSAGE to approach this baseline when k and depth are appropriately tuned, before performance saturates or declines due to excessive signal averaging; and rule-based graphs expose architectural differences GraphSAGE is higherperforming and more stable, whereas GCN is more structuresensitive and degrades with depth implying that approaches preserving node-specific information and flexibly aggregating signals are more robust to structural heterogeneity. Overall, the framework offers practical guidance for method selection and graph construction particularly the choice neighbors and depth in a simplified, reproducible form readily extensible to real-world applications.

23

This research compares how usable mobile apps, web apps, and hybrid apps are. The results reveal deep insight into the user's choice among these three apps. Mobile apps are widely used because they are easy to use and load quickly, showing that they are reliable. Studies indicate that mobile apps are more widely used because loading quickly greatly improves the user experience. Convenience is paramount in satisfying this requirement; people appreciate applications with rapid access and high performance. Although hybrid apps are promising, slow performance diminishes their appeal and makes them inferior to mobile applications. Introduction: This iterative process ensures usability problems are detected early enough and fixed before having to implement long-term solutions later during development. Despite widespread availability, web apps do not appear to be as much in favor among users as they might be due to their slow load speed. With the help of the ANOVA model using the RSM method, determine the usability and attributes of usability.

24

Social network analysis is a rapidly emerging field of research, active since the early 2000s. This research proposes the formulation of a decision support system-based integration of social network analysis and change detection techniques. The social network under study is an email dataset of former US Secretary of State Mrs. Hillary Rodham Clinton via her personal and official email addresses. In the year 2015, Hillary Clinton was facing controversies for using personal email accounts for non-governmental purposes while she was serving as the United States Secretary of State. Some political experts and competitors maintain that Clinton's use of personal email accounts to conduct Secretary of State Affairs violates protocols and federal laws that assure convenient record keeping of government activity. There have been some lawsuits filed over the freedom of information to release Clinton's emails sent and received over her private server over the State Department's failure. August 31st on Monday, the State Department released nearly 7,000 pages of Clinton's heavily updated emails (its biggest release of emails to date). The documents were released as PDFs by the State Department. In this research, we use the NodeXL tool for analyzing the emails from 2011-2012 for checking the Closeness Centrality, Betweenness Centrality, Eigenvectors and Page rank. Although many people have done much work on this topic but in this research, we use the NodeXL tool to check Closeness Centrality, Betweenness Centrality, Eigenvectors and Page rank for more accuracy of all the above factors. This research employs a cleaned, normalized and preprocessed CSV version of the dataset retrieved from the online dataset repository.

25

This survey reviews beamforming methods that pair rate-splitting multiple access (RSMA) with integrated sensing and communications (ISAC) under Cramér–Rao-based sensing criteria. We organize the literature by three goals—joint CRB–rate optimization, rate-under-CRB design (maximize rate subject to CRB guarantees), and CRB-under-rate design (minimize CRB subject to rate or SINR constraints)—and by three hardware settings (fully digital MIMO, RIS- and STARSassisted links, and millimeter-wave hybrid arrays). Common solver patterns combine the weighted minimum mean-squared error (WMMSE) method or fractional programming–quadratic transform (FP–QT) for the communication block with semidefinite relaxation (SDR), sequential convex approximation (SCA), or simple majorization for the sensing block, stitched together through alternating optimization (AO). A comparison table maps each study to its architecture, scenario, objective, and metrics. The main takeaway is practical: the RSMA common stream adds a useful control to traverse the CRB–rate tradeoff; surfaces improve angular diversity but require robust calibration; and hybrid designs benefit from hardware-aware formulations.

Poster Session 1 : IT Fusion Technologies etc.

26

The imbalance of datasets is a significant challenge in training deep neural networks. Especially in manufacturing, there is only one form of ‘normal’, while defects are endless. This disproportion in sample distribution makes models prone to overfitting, resulting in degraded performance. To mitigate this problem, we propose C4, a Color-Channel Concatenation with Contrastive Loss, a defect detection framework based on Siamese Networks. We performed a case study on industrial automation technologies, especially in sealant defect classification. C4 achieves an F1- score of 94.54% and an accuracy of 94.21%, demonstrating its effectiveness in handling class imbalance.

27

Video surveillance is widely used for public safety, but anomalous behaviors often manifest patterns similar to normal ones, making detection difficult. Conventional approaches reconstruct full frames into 3D to learn global structure; however, they have the limitation of greatly increased computation due to redundant information in adjacent frames. This paper proposes a method that reduces the number of frames in powers of two and compares performance and training efficiency with the full-frame approach. Based on the UCF-Crime trimmed dataset, we trained a Video Vision Transformer (ViViT); compared to the full-frame baseline, accuracy differed from −0.74% to +1.27%, while training time was shortened by up to 3.8×. These results suggest that, within the range that preserves global structure, frame reduction can serve as an efficient alternative for video anomaly detection.

28

Visual Question Answering (VQA) models suffer from a language bias problem, where they excessively rely on textual correlations. This study proposes a plausible counterfactual data generation method, named Plausible Counterfactual Data Generation (PCDG), which utilizes Grad- CAM-based visual importance to replace significant objects in a contextually appropriate manner. By synthesizing more contextually relevant samples than other existing augmentation methods, PCDS effectively strengthens visual-language alignment. In experiments on the VQA-CP v2 benchmark, our method achieved significant performance improvements, particularly a 10.56% increase in the 'Num' category and a 2.78% increase in the 'Other' category. This indicates that the proposed method enhances the VQA model's generalization ability and robustness through debiasing.

29

Bluetooth Low Energy (BLE) based indoor positioning systems rely on accurately classifying channel conditions such as line-of-sight (LOS) or non-line-of-sight (NLOS). However, classification models trained in one building rarely generalize to another due to different floor layouts, anchor deployment, and interference patterns. The existing solutions often assume rich channel features, require labels from each new environment, or depend on fixed anchor layouts, which limit their scalability. We propose a BLE based domain adaptive RF channel classification that incorporates adversarial domain alignment and confidence-based pseudo-labeling to leverage unlabeled target data. We evaluate the approach using BLE Received Signal Strength Indicator (RSSI) data collected from three indoor areas: a corridor (source domain), a classroom (target domain), and an office room (unseen test domain). The proposed approach shows 2% gain over the no adaptive classification framework.

30

Early and accurate detection of Alzheimer’s Disease (AD) is critical for timely intervention. While prior deep learning models have achieved promising results using sagittal and coronal slices, the potential diagnostic contribution of axial views remains underexplored. In this study, we propose an enhanced dual-path attention-guided convolutional neural network (CNN) that integrates multi-view 2D T1-weighted MRI slices, including parasagittal, coronal, and axial planes, to improve classification of AD, mild cognitive impairment (MCI), and cognitively normal (CN) subjects. The architecture combines a localized SNeurodCNN branch with a global Inception-v4 backbone augmented by Convolutional Block Attention Module (CBAM). The addition of axial slices produced statistically significant improvements, increasing accuracy from 97.98% to 98.83% (p < 0.05) and enhancing AUC from 0.990 to 0.996. These results demonstrate that axial T1- weighted views provide unique diagnostic cues including ventricular enlargement and cortical thinning that are not fully captured by sagittal or coronal planes, thus offering complementary value in multi-view Alzheimer’s detection frameworks.

 
1 2 3 4
페이지 저장