[SAOC 2024] SARIF Output Analysis and Documentation - Weekly Update #2

Royal Simpson Pinto royalpinto007 at gmail.com
Mon Sep 30 05:23:42 UTC 2024


## Summary of Progress (September 23 – September 29)

In this second week of Milestone 1, I continued my focus on the 
DMD compiler with an emphasis on **SARIF output**. My work this 
week centered on analyzing SARIF outputs generated by **GCC** and 
**Clang**, documenting the results, and understanding how they 
can be applied for DMD’s SARIF integration. I also documented key 
differences between `physicalLocation` and `logicalLocations`, 
which will be included in the DMD developer documentation once 
reviewed.

---

### **What I Worked On:**

#### 1. **Analyzing SARIF Outputs from GCC and Clang**:
I ran test programs to generate SARIF outputs using both **GCC** 
and **Clang** compilers. The goal was twofold:
- To quickly identify SARIF output patterns, validating them with 
the **SARIF Viewer** extension in VS Code.
- To investigate whether SARIF requires each error to have a 
unique error code. My analysis showed that while unique codes are 
not required, they are recommended for clarity.

This analysis is critical as it will inform the **SARIF 
integration** for DMD. The findings show that DMD doesn’t need to 
strictly enforce unique error codes but should aim to include 
them for better usability.

#### 2. **Documenting Findings for DMD SARIF Integration**:
I began documenting the SARIF patterns (an example can be found 
[here](https://docs.google.com/document/d/1Hl0Zbmr93XpapSubd8tLOIIunNfsBFM-DJjWj0BoaJ4/edit?usp=sharing)) observed from GCC and Clang, focusing on how errors are reported. A key part of this documentation was providing a real example of an error scenario, which will help in understanding how to map error outputs to SARIF format in the DMD compiler.

#### 3. **Difference Between `physicalLocation` and 
`logicalLocations`**:
Following my mentor’s guidance, I analyzed the difference between 
`physicalLocation` and `logicalLocations` in SARIF outputs:
- **GCC** provides both `physicalLocation` (the exact 
file/line/column) and `logicalLocations` (the function or class 
context).
- **Clang**, on the other hand, mostly includes 
`physicalLocation` without additional logical context.

I documented these findings (in my local repo as of now, can be 
found 
[here](https://github.com/royalpinto007/d-drafts/blob/main/physicalvslogical.md), which will be added to the DMD developer docs (after mentor's review) to guide other contributors working with SARIF.

#### 4. **Resources and Research**:
I utilized several resources to assist with my analysis and 
documentation, including the **[SARIF 
tutorials](https://github.com/microsoft/sarif-tutorials/tree/main/docs)** and the **SARIF Viewer [extension](https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer) for VS Code**. These tools were helpful in validating SARIF outputs and ensuring they conform to the specification.

---

### **Challenges:**

#### 1. **Navigating Differences Between GCC and Clang**:
While both compilers provide SARIF outputs, their handling of 
error reporting—particularly in how they use 
`logicalLocations`—was different, making it necessary to adapt 
the documentation and integration plan accordingly. Understanding 
these differences took some time, but it will be valuable when 
implementing SARIF support in DMD.

#### 2. **Understanding SARIF Structure**:
Fully grasping the SARIF schema, especially in terms of which 
fields are required and which are optional, was initially 
challenging. However, after going through the SARIF tutorials and 
examining real outputs, I now have a clearer understanding of how 
to structure SARIF output for the DMD compiler.

---

### **Next Week’s Plan:**

- Complete the documentation on **SARIF integration** for DMD, 
including the key differences between `physicalLocation` and 
`logicalLocations`, and finalize the error output example.
- Begin working on integrating SARIF support into DMD, focusing 
on mapping the compiler’s error reporting system to the SARIF 
schema.


More information about the Digitalmars-d mailing list