Originally Posted by tpi2007
Also, as I asked, why do these companies need to do this at all to feed their AI? Why don't they just put all these workers and contract workers transcribing all the films ever made, along with all the TV programs and news in the world and feed it all into the AI, along with all the radio programs from the past and present, heck, just tune into every radio station broadcasting in the world right now to have billions of hours of voice recordings to feed the algorithm and then, on top of all that, pour all of the billions of hours of YouTube content into it:
How much voice content do they need to train the AI?
In short, you need lots
of content to train the AI.
You can't train it on films or radio, for one thing the requests will be different. Then there are subtle differences such as background noise, inflexion etc. all of which will be different for films, radio or commands spoken to an inanimate device.
Neural networks work by blindly taking a large amount of input, putting each input through some mathematical transformations and summing the outputs. The maths (it's actually quite simple maths, just with a large number of "neurons" over increasing numbers of layers) is changed during training so that certain inputs, e.g. the sound of someone requesting that lights be turned off, results in the highest score for the output associated with that action.
Now imagine you trained that action on clips from radio or TV, turning the lights off is pretty much limited to scenes where there is an intruder, or the protagonists are hiding. Suddenly you're associating pitch, tone, background music, pretty much any input commonly associated with those kind of scenes with that output. Translate it to the real world and you have a really crap neural network. If we ever get to the point where we have strong AI, this will be less of an issue. For now however AI is dumb, it doesn't really think and it requires realistic training data.
Personally I don't really see the point in these "assistants" given the current state of the tech, but that's a different issue entirely.