Backdoor Detection via Functionality Profiling

(in embedded device firmware)

Sam L. Thomas, Tom Chothia, Flavio D. Garcia, Christopher Green
University of Birmingham

Overview

  • COTS embedded device security is a disaster:
    • Poor coding practices;
    • Internet-facing “debug” interfaces;
    • Hard-coded credential checks;
    • Additional, hidden services.

Overview (cont.)

  • Lots of devices, lots of firmware;
  • Heterogeneous in their architecture (ARM, MIPS, PPC, etc.);
  • Multiple firmware versions for each device;
  • Manual analysis takes significant time and a degree of expertise.

Overview (cont.)

  • Would like to perform large-scale, lightweight analysis of firmware;
  • But… predicting program behaviour is not possible in the general case (ex. Halting problem);
  • Would like to check firmware for potential anomalous behaviour (ex. backdoors);
  • Focus on devices with largest market share (embedded Linux-based systems).

Overview (cont.)

  • Backdoors are hard to find – not many examples;
  • Modern “malware” analysis utilises classifiers derived from machine learning algorithms:
    • Require large enough data set of benign/malicious examples: not possible for backdoors.
    • How to apply to backdoor detection?

Proposed System

  • A classifier to infer the “class” of a given executable;
  • A domain-specific language to encode expected executable behaviours as functionality profiles (that can be validated);
  • (Due to static analysis) optimistically attempt to validate the usage of identified anomalous executables.

Machine learning

Classifier

Essentially learn a function:

classify : Executable → ExecutableClass

Feature Selection

  • High-level homogeneous attributes consistent amongst multiple architectures:
    • Imported and exported functions;
    • Strings;
  • Apply standard techniques (part of machine learning package – WEKA) to select features with highest utility (Information Gain and Information Gain Ratio).

Difficulties

  • Extraction of executables from binary blobs is imprecise (with current tools):
  • For the features selected, this matters: additional strings distort the input to the training algorithm;
  • Develop an algorithm to rebuild ELF executable prior to feature extraction.

Feature Vectors

Input to the learning algorithm is a vector of booleans encoded as integers with a classification label:

 < 1, 0, 1, 1, 1, 0, 1, ..., web-server > 

Each boolean represents the presence of either an import name, export name, or string within the processed executable and the classification is the general “class” of the executable.

Classifier Choice

Evaluated multiple algorithms; best (w.r.t. speed and precision):

  • C4.5 (Decision Trees) ~96% (~95% against separate test set)
  • VFI (Voting Feature Intervals) ~70%

Performance of chosen classifier (C4.5):

  • Omitting strings: ~93% (~63% against separate test set)
  • Omitting function imports/exports: ~50% (~40% against separate test set)

Binary Functionality Description Language

Interpreter

profile : Executable × ExecutableClass → {true, false}

Specification

<top-level> ::= rule <ident>(<args>) = <expr>                  <rule> ::= <base-rule>
              | import <string>                                         | <ident>

     <expr> ::= <rule>(<values>)                          <base-rule> ::= import_exists
              | let <ident> = <expr> in <expr>                          | export_exists
              | if <expr> then <expr> else <expr>                       | string_exists
              | ! <expr>                                                | function_ref
              | <expr> <binary-op> <expr>                               | string_ref
              | <value> <comp-op> <value>                               | architecture
              | forall <ident>(<values>) => <expr>                      | endianness
              | exists <ident>(<values>) => <expr>

    <value> ::= <const>                                        <const> ::= <bool>
              | <ident>                                                  | <int>
              | <value> <arith-op> <value>                               | <string>

<variables> ::= <variable>                                      <args> ::= ε
              | <variable> , <variables>                                 | <variables>

 <arith-op> ::= + | - | * | / | % | & | ^ | | | ~ | << | >>   <values> ::= <value>
  <comp-op> ::= == | != | < | > | <= | >=                                | <value> , <values>
 <logic-op> ::= || | && | ^^

Predicates

  • Verification of symbol existence:
    • import_exists/export_exists/string_exists;
  • Verification of symbol use within code/data:
    • function_ref/string_ref;
  • Architecture meta-data:
    • architecture/endianness.

Predicates (cont.)

  • Quantified predicates over identified function arguments:
    • exists <function-name> (arg0 : type0, …) => …
    • forall <function-name> (arg0 : type0, …) => …
  • …evaluate to true if the expression specified after => evaluates to true with the estimated arguments passed to the function inserted into its evaluation context.

Examples

exists puts(msg: string) => msg == "Hello, World"

…evaluates to true if at least one use of the function puts is passed “Hello, World” as an argument.

Examples (cont.)

Profile of a strictly TCP service

import "prelude"

rule tcp_only() = tcp()
               && !udp()

Profile of a (rather odd) UDP-based service

import "prelude"

rule picky_udp() = outgoing_udp()
                && !incoming_udp()

Simple HTTP server

import "prelude"

rule httpd() =
 -- No incoming/outgoing UDP traffic (from prelude)
    !udp() 

 -- Incoming/outgoing TCP traffic (from prelude)
 && tcp()  

 -- Expect to listen on port 80 and/or port 443 (SSL/TLS)
 && forall htons(x: int) => (x == 80 || x == 443)             

 -- May read/write files (from prelude)
 && read_write_filesystem()

Validation

Executable usage

  • Firmware is never executed; how to know if a given executable is even used?
  • Static analysis is imperfect and only an estimation:
    • Overestimate: more false positives – hence more paranoia;
    • Underestimate: less false positives – might miss obvious cases;
    • No concrete guarantees in either case.

Estimation of executables used

  1. Scan firmware image for “default” boot/init scripts
  2. Parse shell scripts:
    1. Handle further shell scripts by (2)
    2. Handle executables by scanning for calls to system; handling shell scripts by (2) and executables by (4)

Assumptions

  • All conditional branches in scripts are possible;
  • All calls to system with static strings as arguments in executables are possible;
  • All calls to system without static strings are not considered.

BackScan

Tool Overview

  1. Unpack firmware (uses existing methods)
  2. For each executable:
    1. Classify executable;
    2. Attempt to match the estimated functionality against the corresponding profile;
    3. If anomalous attempt to establish the usage of said executable;
    4. Report anomalies to analyst.

Results

Classifier

  • Performance:
    • On training data using 10-fold cross validation: ~96%;
    • On separate test data (460 firmware images/~18,000 executables): ~95%;
  • Weighted average performance over all classes (correct to 3 s.f.):
    • TP rate: 0.966;
    • FP rate: 0.000;
    • Precision: 0.956;
    • Recall: 0.966.

Classifier (cont.)

Performance on common services (correct to 3 s.f.):

Service TP rate FP rate
web-server 0.957 0.001
ftp-server 0.956 0.001
ssh-daemon 0.960 0.000
telnet-daemon 0.976 0.000
busybox 0.996 0.000

Artificial instances

  • Developed new backdoor: embeddable remote control service;
  • Inserted into common executables found within firmware:
    • mini_httpd;
    • utelnetd;
  • Successfully identified modified executables as anomalous even with varying optimisation levels applied and differing target architectures (ARM and MIPS).

Real-world examples

  • Overall majority of executables checked conformed to their functionality profile (quite a positive result);
  • The majority of those that didn’t were benign, but contained unexpected functionality:
    • Web-server with a built-in DNS resolver.

Real-world examples (cont.)

  • Identified previously undocumented functionality in some D-Link web-servers:
    • Allows for unauthenticated device reconfiguration with shell access to the device;
  • Identified documented backdoor present in a number of instances of Tenda router firmware.

Security Analysis

…or is it possible to evade BackScan?

Sometimes.

  • Static linking binaries removes certain meta-data:
    • Can no longer identify imported library functions;
  • Insertion of superfluous meta-data (e.g. strings) to fool the classifier:
    • Executable may be identified as some other class of executable;
  • Both cases increase the final workload for the analyst but will be detected by the tool as anomalous (at the cost of more False Positives).

Complications

  • Functionality of some executables is very general (ex. Busybox):
    • Impossible to derive a meaningful functionality profile;
  • Misses environmental features that can influence configuration:
    • Listening port numbers for various server software stored within configuration files, etc.;
  • Both are beyond the lightweight analysis techniques proposed.

Future Work

  • Automatic derivation of functionality profiles based on a given set of known non-anomalous example executables for a given class;
  • Construct more complex functionality profiles for more classes of executable.

Questions