Through these vulnerabilities an attacker in the same network (or controlling a malicious NFS server) could gain code execution at the U-Boot powered device. The first two occurrences of the vulnerability were plain memcpy overflows with an attacker-controlled size coming from the network packet without any validation. The memcpy function copies n bytes from memory area src to memory area dest. This can be unsafe when the size being parsed is not appropriately validated, allowing an attacker to fully control the data and length being passed through.
U-Boot contains hundreds of calls to memcpy and libc functions that read from the network such as ntohl and ntohs. In this challenge, you will use CodeQL to find those calls. Of course many of those calls are safe, so throughout this challenge you will refine your query to reduce the number of false positives.
Upon completion of the challenge, you will have a query that is able to find many of the vulnerabilities that allow for remote execution of arbitrary code on U-Boot powered devices.
2 Step 0: Finding the definition of memcpy, ntohl, ntohll, and ntohs
2.1 找到所有名为strlen的函数定义
把下面这段代码拷贝到3_function_definitions.ql
import cpp
from Function f
where f.getName() = "strlen"
select f, "a function named strlen"
选中3_function_definitions.ql,右键->CodeQL: Run Queries in Selected Files
Question 0.0: Can you work out what the above query is doing?
功能是查询所有名为strlen的函数定义,根据结果来看,包括函数定义和函数声明。
import cpp : 导入 c++ 规则库
From Function f : 声明一个 Function 类的变量为 f
where f.getName() = "strlen" : f.getName() 用于获取此变量的名称,也就是满足条件:和strlen相同的Function会被选出来
select f,"a function named strlen" : select的作用是选择要显示的结果,用逗号分隔。嗯,和sql一样。
2.2 找到所有名为memcpy的函数定义
Question 0.1: Modify the query to find the definition of memcpy.
import cpp
from Function f
where f.getName() = "memcpy"
select f, "a function named memcpy"
2.3 找到所有名为ntohl、 ntohll和 ntohs的函数定义或宏定义
Question 0.2: ntohl, ntohll, and ntohs can either be functions or macros (depending on the platform where the code is compiled).
As these snapshots for U-Boot were built on Linux, we know they are going to be macros. Write a query to find the definition of these macros.
Hint: The CodeQL Query Console has an auto-completion feature. Hit Ctrl-Space after the from clause to get the list of objects you can query. Wait a second after typing myObject. to get the list of methods.
hmm..query cosole?ctrl-space?
Hint: We can use a regular expression to write a query that searches for all three macros at once.
借助正则表达式,一次查询三个宏的定义
import cpp
from Macro m
where m.getName().regexpMatch("ntoh(l|ll|s)")
select m, "ntohl, ntohll, and ntohs"
或者可以通过集合表达式来查询:
import cpp
from Macro m
// where m.getName().regexpMatch("ntoh(l|ll|s)")
// select m, "ntohl, ntohll, and ntohs"
// where <your_variable_name> in [“bar”, “baz”, “quux”]
where m.getName() in ["ntohs","ntohl","ntohll"]
select m, "ntohl, ntohll, and ntohs 22222"
3 Step 1: Finding the calls to memcpy, ntohl, ntohll, and ntohs
3.1 找到所有memcpy的调用
Question 1.0: Find all the calls to memcpy.
Hint: Use the auto-completion feature on the function call variable to guess how to express the relation between a function call and a function, and how to bind them.
import cpp
from FunctionCall fc
// FunctionCall.getTarget():返回值类型的是Function,功能是获取被这个函数调用fc所调用的函数
where fc.getTarget().getName() = "memcpy" // 如果fc调用的函数的名称是memcpy
select fc
3.2 找到所有ntohl、 ntohll 和 ntohs的调用
Question 1.1: Find all the calls to ntohl, ntohll, and ntohs.
Hint: calls to ntohl, ntohll, and ntohs are macro invocations, unlike memcpy which is a function call.
import cpp
from MacroInvocation mi
where mi.getMacro().getName().regexpMatch("ntoh(l|ll|s)")
select mi.getExpr()
结果如下:
4 Step 2: Data flow analysis
For this step, we want to detect cases where some data read from the network will end up being used by a call to memcpy. To do this, we’ll use the CodeQL taint tracking library, and its predicate hasFlowPath that will tell us when some data coming from a source flows to a sink. Use the boiler plate provided below to complete your taint tracking query.
Question 2.0: Write a QL class that finds all the top-level expressions associated with the macro invocations to the calls to ntohl, ntohll, and ntohs.
Hint: Querying this class should give you the same results as in question 1.2
import cpp
// 定义一个类:
// 1. 要有class关键字
// 2. 类名首字母必须大写
// 3. 类的supertypes需要由关键字 extends 或者 instanceof 来声明
// 4. 类的body要闭合
class MyMacroInvocation extends MacroInvocation{ // 这个类继承MacroInvocation
MacroInvocation mi;// 声明一个宏调用的变量
MyMacroInvocation(){ // characteristic predicate, 类似构造函数
// mi满足下面的条件,并且this等于mi
mi.getMacro().getName().regexpMatch("ntoh(l|ll|s)") and this = mi
}
}
from MyMacroInvocation mmi
select mmi.getExpr() // 获取满足上面条件的宏调用的表达式
当然,如果要定义一个extends Expr类的类,方法也是类似的:
// 解法3:
import cpp
class MyExpr extends Expr {
MacroInvocation mi;
MyExpr(){
mi.getMacro().getName().regexpMatch("ntoh(l|ll|s)") and this = mi.getExpr()
}
}
from MyExpr me
select me, "33333"
import cpp
class NetworkByteSwap extends Expr {
NetworkByteSwap() {
exists(MacroInvocation mi | mi.getMacro().getName().regexpMatch("ntoh.*") | mi.getExpr() = this)
}
}
from NetworkByteSwap n
select n, "Network byte swap"
Question 2.1: Create the configuration class, by defining the source and sink. The source should be calls to ntohl, ntohll, or ntohs. The sink should be the size argument of an unsafe call to memcpy.
/**
* Holds if data may flow from `source` to `sink` for this configuration.
*
* The corresponding paths are generated from the end-points and the graph
* included in the module `PathGraph`.
*/
相应的path是通过模块PathGraph里的end-points和graph来生成的
再来看看Configuration类的子类:
class Config extends TaintTracking::Configuration {
Config() { this = "NetworkToMemFuncLength" }
override predicate isSource(DataFlow::Node source) {
// 2.1 Todo
}
override predicate isSink(DataFlow::Node sink) {
// 2.1Todo
}