Image captioning system Using Artificial Intelligence

Saba Syed

Image captioning system Using Artificial Intelligence

Authors

Saba Syed PMAS-Arid Agriculture University, Rawalpindi, Pakistan

Keywords:

artificial intelligence, machine learning, neural networks, Image captioning system,

Abstract

People who are blind or visually impaired often find it difficult to fully engage with the world around them, as they cannot accurately perceive the visual information that sighted people take for granted. As a result, they often require some form of human assistance to navigate their environment and access information. In this project, we aim to use image captioning techniques to help solve this problem. By leveraging a large dataset and machine learning algorithms, we hope to improve the ability to convert captured and stored images into text and speech that can be easily understood by blind individuals. Our project is inspired by recent advancements in multimodal neural networks, which have been successfully used in image captioning systems. Specifically, we will use a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to translate images into text. The CNN will be used for visual feature extraction, while the RNN will be trained on image-sentence ground truth pairings to generate captions. One of the main challenges we face is the language barrier. While there have been numerous studies on image captioning for single-target languages, we aim to develop a system that can generate captions in multiple languages. To achieve this, we will use the GoogleTrans library, which implements the Google Translate API. Our project will follow the AI essentials framework for designing AI products, as well as the Scrum methodology for managing the software development lifecycle. We will begin by collecting a dataset of 8,000 images from the Flicker 8k dataset, and use OCR Tesseract to extract text from any images that contain text. We will then use a pre-trained CNN as a feature extractor, and feed these features into an LSTM, which will generate captions. These captions will be translated into multiple languages using the GoogleTrans library, and finally converted to speech using the gTTS library. Overall, our project aims to improve the lives of blind and visually impaired individuals by providing them with a more accurate and comprehensive understanding of their surroundings. By leveraging machine learning and neural networks, we hope to develop a system that is capable of generating accurate and useful captions in multiple languages, thereby bridging the language barrier and making it easier for blind individuals to access information.

Downloads

Read Full Thesis / FYP

Published

30-03-2023

Issue

Vol. 3 No. 1 (2023): Graduate Journal of Pakistan Review (GJPR)

Section

Thesis / Dissertation / FYP

License

Submission Declaration

Authors retain the copyright to their work and grant the Graduate Journal of Pakistan Review (GJPR) the right of first publication under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. This license permits others to share, adapt, and redistribute the work for any purpose, including commercial use, as long as appropriate credit is given to the original authors and the journal.

By submitting a manuscript, authors confirm that the work has not been published previously (except as an abstract, lecture, or academic thesis), is not under review elsewhere, and has been approved by all authors and relevant authorities. Once accepted, the article will be openly accessible under the CC BY 4.0 license, allowing wide dissemination and reuse with proper attribution.

How to Cite

Image captioning system Using Artificial Intelligence. (2023). Graduate Journal of Pakistan Review (GJPR), 3(1). https://www.pakistanreview.com/index.php/GJPR/article/view/175

Download Citation

Image captioning system Using Artificial Intelligence

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Latest publications

Information

Keywords