<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>Univrsity Of Tehran Press</PublisherName>
				<JournalTitle>Journal of Information Technology Management</JournalTitle>
				<Issn>2980-7972</Issn>
				<Volume>18</Volume>
				<Issue>2</Issue>
				<PubDate PubStatus="epublish">
					<Year>2026</Year>
					<Month>04</Month>
					<Day>01</Day>
				</PubDate>
			</Journal>
<ArticleTitle>DeepSeek vs. ChatGPT: Which Performs Better in Python Coding?</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage>1</FirstPage>
			<LastPage>27</LastPage>
			<ELocationID EIdType="pii">107165</ELocationID>
			
<ELocationID EIdType="doi">10.22059/jitm.2026.107165</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>Rania A. M.</FirstName>
					<LastName>Abdalla</LastName>
<Affiliation>Department of Information Technology, Palestine Technical University, Kadoorie, Palestine.</Affiliation>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2026</Year>
					<Month>05</Month>
					<Day>28</Day>
				</PubDate>
			</History>
		<Abstract>This paper conducts a comparative evaluation of two advanced large language models (LLMs) — ChatGPT-4 and DeepSeek v3—utilizing 80 algorithmic problems from Code forces categorized into four difficulty levels: Easy (800–1100), Intermediate (1200–1600), Advanced (1700–2000), and Expert (2100–2400), focusing on code generation in Python. Standardized prompts and controlled testing conditions enable the assessment of models on accuracy, effi-ciency, and code readability. As the complexity of issues increases, DeepSeek frequently out-performs ChatGPT in both accuracy and efficiency, despite both models excelling in simpler tasks. This, however, results in reduced code clarity and increased memory use. While less pre-cise at elevated levels, ChatGPT produces more concise and idiomatic responses. Both models had limited competence at the expert level; however, DeepSeek-R1 indicated a slight edge. The study illustrates a trade-off between accuracy and code clarity, so as to inform the selection of LLMs based on task requirements and provide a foundation for future efforts in optimizing code generation models for actual applications.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">ChatGPT</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">DeepSeek</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Coding</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Algorithms</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Python</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://jitm.ut.ac.ir/article_107165_be406ea61e97982d5919f1a6d76e529f.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
