Blog
The Beginner’s Guide To – RTF Malware Reverse Engineering Part 1
By BUFFERZONE Team, 17/08/2023
Target: Cybersecurity specialist
Tags: RTF, Word, Malware, Content Disarm and Reconstruction (CDR), Reverse Engineering, Zero-Trust.
The Rich Text Format (RTF) is a file format that enables the exchange of text files between different word processors across varying operating systems [1]. It’s encoded using the American Standard Code for Information Interchange (ASCII) standard and utilizes control words for formatting the text, such as bold or italic. Control words are processed by an RTF writer and reader that convert the RTF language into formatting for the word processor [1].
However, this plain text file format has a dark side. Microsoft RTF files are increasingly being used by attackers, especially for phishing attacks, due to their ability to embed various exploits. This is primarily because of the Object Linking and Embedding (OLE) feature of RTF files, which is massively abused by attackers to either link the RTF document to external malicious code or to embed other file format exploits within itself [1].
RTF File-Format
An RTF file comprises of ASCII characters to represent rich text, along with non-ASCII characters converted to appropriate code values.
RTF files are made up of the following key components:
- Control Words: These are specially formatted commands that mark characters for display. Each control word begins with a backslash, is case-sensitive, and can contain ASCII Alphabets (a through z and A through Z). A space, a numeric digit or an ASCII minus sign, or any character other than a letter or a digit can denote the end of the control word’s name. Some control words include parameters represented as positive or negative decimal numbers. Control words like \binN, \revdttmN, \rsidN, and \bliptagN can take values in the range of a 32-bit signed integer [1].
- Control Symbols: These represent special occurrences that have specific meaning depending on their contents. They consist of a backslash followed by a special (non-alphabetical) character and do not have any delimiters [1].
- Groups: These are another building block for the representation of RTF data. The RTF documentation does not provide additional detail in this respect [1].
The RTF file format is divided into different sections that denote different properties and types of content:
- Header: The RTF Version and Character Set.
- Document Text
- Destination Text
- Font Table and Font Embedding
- Code Page Support
- File Table
- Color Table
- Style Sheet
- List Table and List Override Table
- Track Changes
- Document Area: This includes Information Group and Document Formatting Properties.
- Section Text: This includes Section Formatting Properties, Headers, and Footers.
- Paragraph Text: This includes Paragraph Formatting Properties, Paragraph Borders and Shading, Bullets and Numbering, and Table Definitions.
- Character Text: This includes Font Formatting Properties, Character Borders and Shading, and Associated Character Properties.
- Special Characters, Document Variables, Bookmarks, Pictures, Objects, and Drawing Objects.
- Footnotes, Comments, Fields, Form Fields, Index Entries, Table of Contents Entries, and Bidirectional Language Support.
The last release of the RTF format, version 1.9.1, was in March 2008, compatible with Word 2007[1]. Despite its legacy, RTF has been losing traction as a primary file format with the introduction of Word 2010 and its lack of support for new features and functionalities of Word. It still serves as a powerful tool for preserving the integrity of document content across different applications and platforms.
Attacking The Format
We encourage you to review the full description of the attack vectors and how to disarm them [1]. It is important to note that the attack vectors do not only focus on exploiting the document features and vulnerabilities that exist that aim to attack the RTF file reader.
The following are few examples for well-known RTF file exploitation: of the file structure:
- Control Words Exploitation: RTF control words define how the document is presented to the user. Since control words have associated parameters and data, parsing errors can become a target for exploitation. Past exploits have been observed using control words to embed malicious resources. Therefore, it is important to examine a destination control word that consumes data and extracts the stream [1].
- Overlay Data Exploitation: Overlay data refers to additional data appended at the end of RTF documents. This data is used by exploit authors to embed decoy files or additional resources, either in clear or encrypted form. For example, the CVE-2015-1641 exploits embedded both decoy documents and multi-staged shellcodes with markers using overlay data [2].
- Object Linking and Embedding Exploitation: RTF files can embed objects created in other applications due to the OLE feature. The embedded or linked objects are represented as RTF objects, with the data for these objects stored as a parameter in the hex encoded OLESaveToStream format. By abusing this feature, attackers can exploit the parsing vulnerabilities or aid further exploitation [2].
- Delivery of Malicious Payloads: Attackers can use RTF files to deliver malicious payloads. For example, an instance of a malicious Word document with a .DOC extension, which was an RTF file, resulted in GET requests delivering a malicious payload upon launch [3].
Malware Investigation Research Steps:
Investigating RTF malware requires a careful and systematic approach. Below are highly suggested steps we conduct in our research:
- Isolation: Always work in a safe environment when dealing with potential malware. This usually means using a sandbox or a dedicated, isolated system that is not connected to your network. In this blog, we will work inside Ubuntu Virtual Machine.
- Collection: The first step is gathering potentially malicious RTF files. These can be sourced from various locations like spam emails, and suspicious websites, or shared through threat intelligence feeds. We will use MalwareBazaar [4], a public malware repository, to receive interesting malware for analysis.
- Static Analysis: Start by examining the RTF without executing it. This includes viewing the file metadata, the structure, the embedded objects, scripts, or unusual elements. In this blog, we will use the OleTools suite [5] and we will use RTFOBJ, Yara static engine signature [6], malware signatures from ditekshen detection Yara signatures [7] and Dider Stevens rtfdump tool to parse and dump different file sections [8].
- Dynamic Analysis: This involves monitoring the behavior of the RTF file when it is opened. You would typically use a sandbox environment for this, which can safely log the actions of the file, such as network connections, file system modifications, or registry changes. Many evasive behaviors are discovered during dynamic analysis that can highlight behavior that we missed during the static analysis or are unfamiliar with. This part will be outside of this blog’s focus.
- Payload Extraction: If the RTF has an embedded payload, this will need to be extracted for further analysis. This could be another file, a script, or something else. Payload extraction can be done as part of the static analysis or part of the dynamic analysis features.
- Code Analysis: If the RTF includes embedded or obfuscated code, such as OLE objects or PowerShell, this must be analyzed. This involves de-obfuscating the code, understanding its functionality, and identifying any potential exploits or vulnerabilities it might use. This will be done as part of our static analysis investigation.
- Threat Intelligence Correlation: Correlate the information collected about the OLE malware with threat intelligence data. This can give information on the possible threat actors, campaigns, their methods, or whether this malware has been observed before. This step is done after the collection and during the static and dynamic analysis. When we discover Information of Compromise (IOC) which are a list of drop file (sha256 /MD5 hash representation), URL’s, IP addresses in the file, we can enhance our understanding of the file capabilities based on threat intelligence.
- Reporting: Finally, document your findings. This report should detail the characteristics of the malware, how it works, its impact, and recommended mitigation strategies.
Remember always to stay safe when investigating potential malware, and only do so in a controlled and isolated environment. It is important to keep systems and software up to date to protect against known vulnerabilities that malware often exploits. This tutorial is for educational purposes only. Please take full responsibility while handling dangerous malicious files.
RTF Research
In this blog, we will investigate sha256: ed248657afc15600a6b8e5b9cfa94203f9bfeda0ebd1a3007356e99836adeddf
Threat Intelligence:
The first stage will be reviewing the file in VirusTotal to get reputation and information about the file.
We can observe that the file is detected as malicious by 32 engines, and the popular threat is trojan type: CVE-2017-11882 or CVE-20178-0802
Dynamic Analysis:
From viewing the file in a VMRAY sandbox environment (Link ):
Based on this sandbox run, the winword.exe executes the equation (CVE-2017-11882 [9]) through RPC and exploits it to download lawserhgj5784.exe from the network. Now, let’s perform a static analysis to achieve the same outcome.
Now let’s review it from the static analysis point of view.
Static Analysis:
To begin our analysis, we will be using a script called Oleid. This script is designed to thoroughly examine OLE files and identify any unique characteristics that may indicate malicious intent. It can detect the presence of VBA macros and embedded Flash objects. Despite our use of Oleid, we did not observe any suspicious activity.
We also ran RTFOBJ on the file and discovered that it contains a “not a well-formed OLE object.” However, this alone is not enough to determine the behavior of the file.
Given the obfuscated nature and the complexity of the file, we’ll employ Yara signatures to enhance our understanding and increase visibility. To gain further insight, we’ll utilize the Yara signatures in conjunction with Ditekshen. The outcome reveals that the Yara signature identifies a detection for CVE-2017-11882.
Using rtfdump.py, we can detect that the equation 3.0 [9] entity is detected in the RTF document in the 4th group level 3 within the file.
If we run:
python rtfdump.py ed248657afc15600a6b8e5b9cfa94203f9bfeda0ebd1a3007356e99836adeddf.rtf -s 4 -Hi
We will receive:
If we run it without “-Hi” we will receive the object content outside this blog post scope.
In conclusion, the RTF file format is simple, yet it has many capabilities that malware authors take advantage of. Despite being an old CVE-2017-11882, attackers still use it with various modifications.
Content Disarm and Reconstruction (CDR) technology is a great solution to counter this. CDR removes any suspicious attack vectors, whether they are malicious or not.
Our next blog post will continue to delve into the RTF file format attacks and how CDR can prevent the attack.
References
[1] Ran Dubin, “Content Disarm and Reconstruction of RTF Files a Zero File Trust Methodology,” in IEEE Transactions on Information Forensics and Security, vol. 18, pp. 1461-1472, 2023, doi: 10.1109/TIFS.2023.3241480.
[2] Chintan Shah , An Inside Look into Microsoft Rich Text Format and OLE Exploits, https://www.mcafee.com/blogs/other-blogs/mcafee-labs/an-inside-look-into-microsoft-rich-text-format-and-ole-exploits/
[3] Omri Herscovici , Microsoft Word Intruder RTF Sample Analysis, https://blog.checkpoint.com/research/microsoft-word-intruder-rtf-sample-analysis/
[4] MalwareBazaar, Free Malware Repository, https://bazaar.abuse.ch/browse/
[5] OleTools, https://github.com/decalage2/oletools
[6] Yara, The pattern matching Swiss knife for malware researchers, https://virustotal.github.io/yara/
[7] Ditekshen, Yara signatures, https://github.com/ditekshen/detection.
[8] Didier Stevens, rtfdump, https://github.com/DidierStevens/DidierStevensSuite/blob/master/rtfdump.py
[9] CVE-2017-11882, https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2017-11882