Question answering models are designed to automatically generate accurate and contextually relevant answers from a dataset. These models leverage advanced techniques in natural language processing, such as transformer architectures like BERT and RoBERTa, to understand and extract the precise information needed to formulate responses. This work presents a comparative evaluation of BERT and RoBERTa models, focusing on key performance metrices such as Exact Match (EM), BiLingual Evaluation Understudy (BLEU), and F1 Score. Start word score and end word score determine the precise boundaries of the answer within a passage. These scores reflect the ability of the model and its accuracy is crucial for high EM and F1 Scores. RoBERTa's advanced architecture and fine-tuning processes enable it to more accurately identify these positions, resulting in more precise and contextually relevant answers. This study highlights RoBERTa's superior performance, with an EM of 75%, a BLEU of 80%, and an F1 of 87%, outperforming BERT, which achieved 70%, 75%, and 82% in the respective metrics. The findings of this study establish RoBERTa as the preferred model for question answering tasks, particularly in applications requiring high precision and exact answer identification. This research emphasizes the importance of start and end word selection in driving model performance and suggests areas for further refinement in question answering systems.