Gestu­re reco­gni­tion sys­tems have been aro­und for quite some time now, but we still can’t say we have disco­ve­red neither all the possi­bi­li­ties they have to offer nor the appro­aches we can take.

The emer­gen­ce of inte­gra­ted sen­sors on the mar­ket, inc­lu­ding infra­red trans­mit­ters and rece­ivers, and pre­ci­se timing sys­tems ope­ra­ting with gre­at accu­ra­cy make the use of this tech­no­lo­gy very encouraging.

Gestu­re Con­tol PoC Demo-Video

The­re are coun­tless types of sen­sors that can be used for gestu­re reco­gni­tion. Given a chan­ce to try our­se­lves in this field, we’ve deci­ded to cre­ate our new pro­duct – a gestu­re reco­gni­tion sys­tem using Time of Fli­ght (ToF) came­ra. Even the best gestu­re reco­gni­tion sys­tem has no point witho­ut prac­ti­cal use cases, so an appli­ca­tion allo­wing to play Tetris using gestu­res was deve­lo­ped.

Time of Fli­ght
ToF — is a measu­re of the time it takes for an object, par­tic­le, or wave (e.g. aco­ustic, elec­tro­ma­gne­tic, etc.) to tra­vel a distan­ce thro­ugh a medium. This infor­ma­tion can then be used to esta­blish a time stan­dard, as a means of measu­ring velo­ci­ty or path length allo­wing the came­ra to detect objects in three dimen­sions.
The tech­no­lo­gy works by sen­ding mul­ti­ple pul­ses of light up to five meters away. Bri­ght pul­ses return to the 3D ToF came­ra when they hit an object, and the time it takes to return to the came­ra is used to cal­cu­la­te the objec­t’s distan­ce or depth. Just think of it as sonar or echo­lo­ca­tion, albe­it in the light rather than in sound.
Unli­ke regu­lar came­ras, ToF doesn’t need an exter­nal light sour­ce to per­form cor­rec­tly, so it is the per­fect solu­tion for pla­ces that are not always well lit, for exam­ple, car inte­riors. Due to the­ir small dimen­sions, ToF sen­sors can be used in a wide ran­ge of applications. 

The most popu­lar among them are:

  • pro­xi­mi­ty sen­sors for robots
  • toilet paper and soap dispensers
  • flu­shing cisterns in toilets
  • sink mixers
  • object sen­sors in robo­tic vacu­um cleaners
  • che­ap user pre­sen­ce detec­tors in lap­tops and monitors
  • inven­to­ry mana­ge­ment sys­tems in ven­ding machines
  • ven­ding machi­ne coin counters
  • gro­und pro­xi­mi­ty detec­tors for drones
  • ceiling pro­xi­mi­ty detec­tors for indo­or drones
  • pre­sen­ce and gestu­re sen­sors in reta­il outlets
  • or play­ing Tetris!

Neu­ral Network
The next step after get­ting the ima­ge was to answer the question ‘Wha­t’s in the ima­ge we’ve just rece­ived from the ToF came­ra?’. The­re isn’t any infor­ma­tion on the inter­net abo­ut deve­lo­ping gestu­re reco­gni­tion on the ToF sen­sor. After play­ing with deter­mi­ni­stic algo­ri­thms for some time, we have deci­ded to bring out the big guns. We cre­ated a spe­cial algo­ri­thm for ima­ge pre­pro­ces­sing first and then pre­pa­red a convo­lu­tio­nal neu­ral network model, col­lec­ted a data­set con­ta­ining snap­shots of gestu­res we wan­ted our neu­ral network to reco­gni­ze. After few ite­ra­tions of opti­mi­zing the model, we achie­ved a reco­gni­tion accu­ra­cy of 98%.
Wha­t’s more, we mana­ged to run our reco­gni­tion sys­tem in real-time on Rasp­ber­ry Pi 4!