Jan 292015

History, always seems to repeat itself…

I always look forward to new updates of gaming platforms, they general come with an increase in processing and graphical power which pretty much equates into opportunities for us developers to do more in our games. What I don’t look forward to however is hearing phrases like the following being bantered around at every major update;

“We won’t have to worry about memory anymore.”

“X times more powerful CPU, why bother with optimization.”

“We will no longer need to limit the number of polygons in our models.”

and my favorite by far:

“No more limits on texture size.”

I usually get to hear these at every major generation update, I heard them when we went from the PS1 to PS2 and XBox (original), then again to the PS3 and XBox 360 and just recently as we switch to 64-bit technology and the XBox One and PS4. There is a part of me that finds this hilarious that as an industry we never seem to learn from the past but there is also the other part of me that has to deal with the fall out of people thinking along these lines.

There are two sentences that one day I might chisel into a stone plaque and nail to the office door:

There will never be enough processing power.

There will never be enough memory.


Almost every project will hit one of these walls and every medium or larger project will hit both of them. Its always good to have plans prepared for dealing with these events when they occur however most projects only deal when the problems actually surface.

So lets consider what we can do about the first one on the list, CPU performance;

The essence of the game loop is to process the user’s input, simulate the AI and game objects, update the world and render the visuals. The sooner this is done the sooner you can repeat the cycle and the more times you can do this in a second the smoother and more pleasant the gaming experience will be for the player. This is more commonly referred to as the Frames Per Second (FPS) and it is part of the reason a development build should always be displaying the FPS, you can see at a glance potential issues as early as possible.

Once we are aware of a performance issue we need to narrow it down, to focus our optimization effort to the specific code that will give us the most benefit. After all we generally don’t have unlimited time and resources at our disposal and we want to make sure that we are making good optimizations. This is the point where we need to generate some metrics, metrics can tell us where the specific issues are. After we have optimized the code and improved the performance, the metrics can then tell us how much of an improvement in performance we have made.

The rest of this article is focused on a system for gathering metrics, what is presented here is the basic bare minimum and we will build upon it in the next article.

The project files for this article can be found here: Unity-13.zip

Before continuing I should mention the built in Profiler available to Unity Pro users, this is a great tool however there are a few issues I have with it… It only works when you have a hard connection to the target platform and the biggest issue is that the granularity is fixed. If you wanted to isolate performance metrics for a specific section of code or a system the profiler can be miss-leading. This doesn’t mean that the profiler is useless far from it, it is still an invaluable tool however the following system presented allows you to create focused metrics.

A Profiler

In order to measure the time it takes to execute a piece of code we need a timer that is constantly being updated, in unity almost all of the Time values only update between frames and as such are not very useful for performance measuring i.e. Time.time, Time.elapsedtime, etc. There is one timer however that does give us what we want, which is Time.realtimeSinceStartup. This value is always being updated and hence reading subsequent values will give us the ability to measure the time it has taken to execute a section of code.

   float start_time = Time.realtimeSinceStartup;

   // do some processing here

   float time_taken = Time.realtimeSinceStartup - start_time;

This is the basics of measuring the execution duration, however there are two traps that you should be aware of;

1. Don’t gather metrics from a single execution loop, we need to gather multiple frames of data and draw averages to get a realistic idea of the cost of a function.

2. Don’t measure too small a piece of code, you just won’t get consistent values.

To make our life easier, lets create a new manager to be our go to point for gathering and displaying metrics.

using UnityEngine;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;

public class ProfileCPU : MonoBehaviour
   static ProfileCPU Singleton = null;

   // Set this to be your desired FPS, all percentages will be based from this target.
   public int m_TargetFPS = 30;
   public bool m_DisplayStats = false;

   // Storage space for our profiling functions.
   private Dictionary<String, ProfileCPUData> m_ProfileMarkers;

   // A tick counter for the number of frame metrics that have been collected.
   private int m_NumberOfFrames;

   public static ProfileCPU Instance
      get { return Singleton; }

   void Awake()
      if (Singleton != null)
         UnityEngine.Debug.LogError("Multiple Performance Singletons exist!");

      Singleton = this;

      m_ProfileMarkers = new Dictionary<String, ProfileCPUData>();

If most of the above code seems similar to our previous managers then you are not wrong, we are after all using the same pattern.

What is unique to this manager however are the two variables; m_TargetFPS and m_ProfileMarkers.

Set the m_TargetFPS to be the Frames Per Second you are targeting for your game, you might consider making this platform specific for example on the PC you could aim for 60fps and for mobile 30fps, either way what ever value you enter here will effect the calculations of percentages we shall see later in the metric output.

The m_ProfileMarkers, is a dictionary of our gathered metric data, essentially the performance metric id (string) is used to look up the specific ProfileCPUData.

In the manager we have two functions that are the main interface from the game code to gather metric data;

   public void StartRecording( String id )
      if ( !m_ProfileMarkers.ContainsKey( id ))
         m_ProfileMarkers.Add( id, new ProfileCPUData(id) );


   public void StopRecording(String id)
      if (!m_ProfileMarkers.ContainsKey(id))
         m_ProfileMarkers.Add(id, new ProfileCPUData(id));


Both functions work by finding the class referenced by the id (Dictionary’s are excellent for this sort of thing), if it doesn’t find the unique id it will create a new entry.

The actual metric information is processed internally in the ProfileCPUData start and stop implementations which we will get to later.

The game code would accesses the manager in the following fashion, make sure that for every Start you have a Stop and you don’t call Start  and Stop out of sequence. The code will output a log message if you are out of sync.

   ProfileCPU.Instance.StartRecording( "AI Simulate" );
   // do some processing here

   ProfileCPU.Instance.StopRecording( "AI Simulate" );

The update function in the manager is used to tick the frame and if a request to display the currently gathered metrics is made then it will output the current status to the console.

   public void Update()

      // Iterate all current markers and reset for a new frame.
      foreach ( KeyValuePair<String, ProfileCPUData> data in m_ProfileMarkers )
         ProfileCPUData CPU_data = data.Value;


      // Has the user requested the current stats to be displayed?
      if ( m_DisplayStats )
         m_DisplayStats = false;

Lets take a look at the ProfileCPUData class now;

using UnityEngine;
using System;
using System.Collections;
using System.Diagnostics;
using System.IO;

public class ProfileCPUData
                     // The profile marker id.
   private string    m_Identifier;
                     // Indicates if a recording is currently in progress.
   private bool      m_Active;
                     // The start time of the current recording.
   private float     m_StartTime;

                     // Time spent processing this frame.
   private float     m_TimeSpent;
                     // Total number of calls this frame.
   private int       m_NumberOfCalls;

                     // Time metrics.						
   private float     m_TotalTime;
   private float     m_PeakTime;
   private float     m_LastFrameTime;

                     // Call metrics.
   private int       m_TotalCalls;
   private int       m_PeakCalls;
   private int       m_LastFrameCalls;

   public string     Id               { get { return m_Identifier; } }

   public float      TotalTime        { get { return m_TotalTime; } }
   public float      PeakTime         { get { return m_PeakTime; } }

   public int        TotalCalls       { get { return m_TotalCalls; } }
   public int        PeakCalls        { get { return m_PeakCalls; } }

   public float      LastFrameTime    { get { return m_LastFrameTime; } }
   public int        LastFrameCalls   { get { return m_LastFrameCalls; } }

   public ProfileCPUData( string identifier )
      m_Identifier = identifier;
      m_Active = false;
      m_StartTime = 0.0f;
      m_TimeSpent = 0.0f;
      m_NumberOfCalls = 0;

      m_TotalTime = 0.0f;
      m_PeakTime = 0.0f;
      m_LastFrameTime = 0.0f;

      m_TotalCalls = 0;
      m_PeakCalls = 0;
      m_LastFrameCalls = 0;

   public void Start( )
      if ( m_Active )
         Dbg.Warning("Start / Stop Mis-match on id: " + m_Identifier );

      m_Active = true;
      m_StartTime = Time.realtimeSinceStartup;

   public void Stop( )
      if ( !m_Active )
         Dbg.Warning("Start / Stop Mis-match on id: " + m_Identifier);

      m_TimeSpent += (Time.realtimeSinceStartup - m_StartTime);
      m_Active = false;

   public void FrameEnd()
      // Update the Time stats.
      m_TotalTime += m_TimeSpent;

      if ( m_PeakTime < m_TimeSpent )
         m_PeakTime = m_TimeSpent;

      m_LastFrameTime = m_TimeSpent;

      // update the call stats
      m_TotalCalls+= m_NumberOfCalls;
      if ( m_PeakCalls < m_NumberOfCalls )
         m_PeakCalls = m_NumberOfCalls;

      m_LastFrameCalls = m_NumberOfCalls;

      // Reset for a new frame.
      m_Active = false;
      m_TimeSpent = 0.0f;
      m_NumberOfCalls = 0;

It is essentially a container of metrics with a start, stop and a frame reset and that is it (at least for this initial version). Hopefully nothing scary or confusing in there.

The only function left is the Display Stat function for the manager which looks like this;

   private void DisplayStats()
      Dbg.Trace( " Frames: " + m_NumberOfFrames );
      Dbg.Trace( " Id Calls [Peak/Avg/Last] Time(ms) [Peak/Avg/Last] % of frame" );
      Dbg.Trace( "-----------------------------------------------------------------------------------------" );

      float target_MS = 1000.0f / (float)m_TargetFPS;

      foreach ( KeyValuePair<String, ProfileCPUData> data in m_ProfileMarkers )
         ProfileCPUData CPU_data = data.Value;

         String details = string.Format("{0,20} [{1,4} / {2,4} / {3,4}] - [{4,8:N3} / {5,8:N3}/{6,8:N3}] - {7,8:N3}%",
                           CPU_data.PeakCalls, CPU_data.TotalCalls / m_NumberOfFrames, CPU_data.LastFrameCalls,
                           (CPU_data.PeakTime * 1000.0f),
                           (CPU_data.TotalTime * 1000.0f) / (float) m_NumberOfFrames ,
                           ( CPU_data.LastFrameTime * 1000.0f),
                           ((CPU_data.LastFrameTime * 1000.0f) / target_MS) * 100.0f );

         Dbg.Trace( details );

Now that we have a system for gathering metrics lets get on to the fun stuff, in the project files for this article I created a SimulateProfileData component that can be attached to an object.  In our previous project we had a single rotating cube I’ve expanded upon that and we now have 12 cubes, each with the new component attached.

There are 12 Cubes

The SimulateProfileData component is designed to have some expensive functions, the functions have been configured to gather metrics which we are going to use to test our new manager. Obviously different machines will have different effects however as you can see in the screenshot above I’m getting about 14fps on my machine when I’m trying to target 30fps.

This basic version will output the current gathered metrics when the managers DisplayStats bool is set to true, this could be wired up to a button click or done on a timer or even the number of frames processed. However for now while the game is running simply tick the box in the Inspector to force a dump.

Unity13 - ClickToGetMetrics

The dump of the metrics to Unity’s console can appear quite messy;

Unity13 - Metric Output

thankfully we took steps to help address this when we created our own logging system, remember Unity-11: http://www.thegameengineer.com/blog/2014/02/09/unity-11-logging/

If we take two snapshots of the metrics and then open up the generated log file we will find the data in a much more easily digestible format:

[22:46:54] Creating Temporary App.
[22:47:46]  Frames: 236
[22:47:46]   Id                 Calls [Peak/Avg/Last]    Time(ms) [Peak/Avg/Last]       % of frame
[22:47:46] -----------------------------------------------------------------------------------------
[22:47:46]                  Sim [  16 /   15 /   16] - [ 128.768 /   73.668/  94.193] -  282.578%
[22:47:46]             SimFunc5 [  22 /   12 /   16] - [ 123.102 /   66.241/  84.656] -  253.967%
[22:47:46]             SimFunc3 [  25 /   12 /   14] - [   2.439 /    0.665/   0.912] -    2.735%
[22:47:46]             SimFunc1 [  24 /   13 /   18] - [   0.031 /    0.010/   0.008] -    0.023%
[22:47:46]             SimFunc4 [  26 /   12 /   16] - [  12.863 /    6.661/   8.511] -   25.532%
[22:47:46]             SimFunc2 [  22 /   12 /   15] - [   0.133 /    0.068/   0.084] -    0.252%
[22:48:40]  Frames: 503
[22:48:40]   Id                 Calls [Peak/Avg/Last]    Time(ms) [Peak/Avg/Last]       % of frame
[22:48:40] -----------------------------------------------------------------------------------------
[22:48:40]                  Sim [  16 /   15 /   16] - [ 146.263 /   71.970/ 107.346] -  322.037%
[22:48:40]             SimFunc5 [  26 /   12 /   17] - [ 137.108 /   64.615/  95.444] -  286.331%
[22:48:40]             SimFunc3 [  25 /   12 /   19] - [   2.439 /    0.658/   1.068] -    3.204%
[22:48:40]             SimFunc1 [  26 /   13 /   12] - [   0.053 /    0.010/   0.008] -    0.023%
[22:48:40]             SimFunc4 [  26 /   12 /   18] - [  13.248 /    6.598/  10.696] -   32.089%
[22:48:40]             SimFunc2 [  28 /   12 /   16] - [   0.141 /    0.067/   0.099] -    0.298%

The first metrics were taken after the boxes had been spinning for 236 frames (about 52 seconds) and the last log was created after 503 frames (about 106 seconds).

It’s all in the numbers

So lets take a moment to look at the log, lets consider the function SimFunc4.

[22:48:40]             SimFunc4 [  26 /   12 /   18] - [  13.248 /    6.598/  10.696] -   32.089%


The first 3 numbers relate to the number of calls being made to the function. The most it was called in a single frame was 26 times, the number of times it was processed in the last frame was 18 times and the average number of calls since recording began is 12 times a frame.

The second section of numbers tells us that the last frame spent 10.696 ms processing this function, it’s peek processing time was 13.248 ms and the average overall is 6.598 ms.

Remembering that there are 1000 ms in a second and if we are aiming for 30 frames per second then every frame has available 33.333 ms, which means that this function is taking 32.089% of our target fps. If this was a normal function in a game and it was taking 32% of the time it would certainly become a focus point for optimization and improvement.

You might of noticed that all the timings for the 5 functions just happen to tally almost with the value for the Sim function, this is because the Sim function is the entry point for SimFunc1-5 to be called and hence it’s timings include all the time spent processing those functions as inclusively.

Although we have placed the start and stop at the beginning and end of our test functions there is no reason why they have to be in the same function, indeed in some cases like managers it can be useful to place them in different functions.


In the next article we will build upon what we have so far and improve our profiling system further.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>