Taking to the air

Recently, a new feature popped up on WordPress, which is what I’ve been using to publish this site. They’ve added the ability to generate podcasts directly from posts, and I decided to give it a try, with the first episode going up on June 30.

While it’s entirely possible to read and record everything myself, production of a single podcast is a bit more complicated than just writing and formatting an article, so I’ll be using their AI voices to help. Overall, the reading isn’t half bad, although they do still have a tendency to misread somethings.

For example, the year 2021, when written that way, comes out as two thousand and twenty-one instead of twenty twenty-one. And sometimes, it doesn’t know what to do with non-English words.

For example, in one story, it pronounced the Spanish sauce “mole” like the English word for the burrowing mammal. Unfortunately, there is no accent in Spanish since the emphasis is already where it should be, on the “O,” so no clue that way. Oddly enough, in the phrase “holy moley,” which came later, it did pronounce it the right way but, obviously, spelling the sauce wrong wouldn’t work for the printed article.

Unfortunately, it’s not possible to edit the text that’s copied into the sound file generator, which seems like a drawback.

Otherwise, it’s a fun process. I will record the intros and outros with background music and add incidental transition music and the like, but I’m leaving the heavy lifting to the AI because it saves me a lot of time.

Besides, the results are probably a lot more pleasant than people having to listen to my “Where the hell is he from?” accent for a full episode. If you don’t believe me, just listen, and, forgetting anything I’ve told you in this blog, tell me where I’m originally from. I just had too many different regional inputs growing up and somehow picked up on them all.

I am convinced that my hometown has no real accent because it kind of has all of them, and they wind up mixing and matching.

As for the podcast progress, I think I’ll start it out at once per week, and will be recording already existing articles from the blog. Of course, that’s how the system works. The podcast recorder will only pull and convert text from published articles.

At the moment, meaning July 5, 2021, the first episode is only available on Spotify, Pocket Casts, Breaker, and RadioPublic, but it should be available on Google podcasts some time soon.

Spotify seems to be the one that shows up right away, and the rest are listed in the order they were published over a couple of days. Of course, this whole process could speed up with subsequent episodes.

I never thought I’d have any interest in doing the podcast thing, but I don’t hate it so far! So this could be a short experiment or it could turn into something else. Stay tuned to this space for updates.

And yes, this is a short post, but it’s a holiday weekend in the U.S. as we celebrate our independence — officially, it was on Sunday, July 4, but people have been shooting off fireworks all week long and the Federal holiday is today, Monday.

But happy Independence Days in advance to the visitors from my top twelve countries which are, in date order, January 26, March 26, May 5, June 12, July 1, July 14, August 10, August 14, August 15, September 16, October 1, and October 3. I won’t mention any names. You can figure out who you are.

Image source: Kevin Campbell from Pixabay. (CC) Simplified Pixaby License.

Wednesday Wonders: Facing the music

For some reason, face morphing in music videos really took off, and the whole thing was launched with Michael Jackson’s video for Black or White in 1991. If you’re a 90s kid, you remember a good solid decade of music videos using face-morphing left and right.

Hell, I remember at the time picking up a face-morphing app in the five dollar bin at Fry’s, and although it ran slow as shit on my PC at the time, it did the job and morphed faces and, luckily, it never got killed by the “Oops, Windows isn’t backward compatible with this” problem, so it runs fast as hell now. Well, whenever I last used it, and it’s been a hot minute.

If you’ve never worked with the software, it basically goes like this. You load two photos, the before and after. Then, you mark out reference points on the first photo.

These are generally single dots marking common facial landmarks: inside and outside of each eye, likewise the eyebrows and mouth, bridge of the nose, outside and inside of the nostrils, top and bottom of where the ear hits the face, major landmarks along the hairline, and otherwise places where there are major changes of angle.

Next, you play connect the dots, at first in general, but then it becomes a game of triangles. If you’re patient enough and do it right, you wind up with a first image that is pretty closely mapped with a bunch of little triangles.

Meanwhile, this entire time, your software has been plopping that same mapping onto the second image. But, at least with the software I was working with then (and this may have changed) it only plops those points relative to the boundaries of the image, and not the features in it.

Oh yeah — first essential step in the process: Start with two images of identical dimensions, and faces placed about the same way in each.

The next step in the morph is to painstakingly drag each of the points overlaid on the second image to its corresponding face part. Depending upon how detailed you were in the first image, this can take a long, long time. At least the resizing of all those triangles happens automatically.

When you think you’ve got it, click the magic button, and the first image should morph into the second, based on the other parameters you gave it, which are mostly screen rate.

And that’s just for a still image. For a music video, repeat that for however many seconds any particular transition takes, times 24 frames per second. Ouch!

I think this will give you a greater appreciation of what Jackson’s producers did.

However… this was only the first computerized attempt at the effect in a music video. Six years earlier in 1985, the English duo Godley & Creme (one half of 10cc so… 5cc?) released their video Cry, and their face morphing effect is full-on analog. They didn’t have the advantage of powerful (or even wimpy) computers back then. Oh, sure, they had pulled off kind of early CGI effects for TRON in 1982, but those simple graphics were nowhere near good enough to swap faces.

So Godley & Crème did it the old fashioned way, and anyone who has ever worked in old school video production (or has nerded out over the firing up the Death Star firing moments in Episode IV) will know the term “Grass Valley Switcher.”

Basically, it was a mechanical device that could take the input from two or more video sources, as well as provide its own video input in the form of color fields and masks, and then swap them back and forth or transition one to the other.

And this is what they did in their music video for Cry.

Although, to be fair, they did it brilliantly because they were careful in their choices. Some of their transitions are fades from image A to B, while others are wipes, top down or bottom up. It all depended upon how well the images matched.

In 2017, the group Elbow did an intentional homage to this video using the same technique well into the digital age — and with a nod from Benedict Cumberbatch, with their song Gentle Storm.

And now we come to 2020. See, all of those face morphing videos from 1991 through the early 2000s still required humans to sit down and mark out the face parts and those triangles and whatnot, so it was a painstaking process.

And then, this happens…

These face morphs were created by a neural network that basically looked at the mouth parts and listened to the syllables of the song, and then kind of sort of found other faces and phonemes that matched, and then yanked them all together.

The most disturbing part of it, I think, is how damn good it is compared to all of the other versions. Turn off the sound or don’t understand the language, and it takes Jackson’s message from Black or White into the stratosphere.

Note, though, that this song is from a band named for its lead singer, Lil’ Coin (translated from Russian) and the song itself is about crime and corruption in Russia in the 1990s, titled Everytime. So… without cultural context, the reason for the morphing is ambiguous.

But it’s still an interesting note that 35 years after Godley & Crème first did the music video face morph, it’s still a popular technique with artists. And, honestly, if we don’t limit it to faces or moving media, it’s a hell of a lot older than that. As soon as humans figured out that they could exploit a difference in point of view, they began making images change before our eyes.

Sometimes, that’s a good thing artistically. Other times, when the changes are less benevolent, it’s a bad thing. It’s especially disturbing that AI is getting into the game, and Lil’ Coin’s video is not necessarily a good sign.

Oh, sure, a good music video, but I can’t help but think that it was just a test launch in what is going to become a long, nasty, and ultimately unwinnable cyber war.

After all… how can any of you prove that this article wasn’t created by AI? Without asking me the right questions, you can’t. So there you go.

Image: (CC BY-SA 2.0) Edward Webb

Our best weapon against AI is humor

My day job revolves around health insurance and, because of HIPPA regulations, the office has landlines. We can’t do VOIP because it’s not as secure. The theater I work at some evenings uses nothing but VOIP. I’m sure that the main consequence of this is that the theater never gets robo or sales calls, while the office gets them constantly.

Fortunately, I have absolutely no obligation to be nice to robo-callers or even to listen to their pitches. I’ve hung up on them in mid-sentence. To make it more confusing for them, I’ve hung up in the middle of my sentence. Sometimes, if they’re trying to pitch a service that the boss already has and I know that he did meticulous research before he obtained it or has a personal relationship with the provider, I’ll respond with a terse, “Thanks, but we’re happy with what we have,” and then hang up.

The fun ones are when we get calls trying to sell Medicare insurance. They start out just talking about Medicare Supplement plans, and those are perfectly legal to advertise. Why? Because no matter the provider, each particular plan has the same premium, determined by age, and has the same basic benefits.

These are the plans that cover deductibles, copays, and coinsurance not covered by other plans or Medicare itself. Where they differ is in the extras they toss on. Some of them provide gym benefits, others provide personal emergency systems — i.e. the “I’ve fallen and I can’t get up” necklace, others provide free over-the-counter stuff, like vitamins and cold remedies, by mail. It’s a mix-and-match, and what it’s really doing is providing people to decide what they prefer among plans that are otherwise identical.

So far, so good. If it’s a slow day and I get one of these calls, I will always push the button for more info, which connects me to a live operator. This is where it gets fun, because it is illegal to cold-call someone to try to sell them Medicare Advantage or Medicare Prescription Drug Plans.

Don’t worry if you don’t know what all those terms mean. I didn’t either six months ago. The gist of it is that selling these in the same way is illegal because their costs and coverages vary wildly, and it all depends upon the person being insured, and which medications they’re taking.

For somebody taking no drugs or with one or two common and cheap generics, Coverage X may only cost $13 a month. For someone with a lot of prescriptions, especially if one or more only come in a brand instead of a generic, Coverage X may cost hundreds or thousands of dollars a year. And for each of them, the price of Coverage X, Coverage Y, and Coverage Z may also vary widely, also depending on whether they have a preferred pharmacy or not, and whether that pharmacy is in or out of network for the provider.

In other words… this is something people need to discuss with a professional who can look at their specific needs, analyze the options, and give the best and cheapest advice. That cold caller is probably only calling for a small number of (or even only one) providers, so they don’t care what your situation is going to cost. They only want to get you to buy what you’re selling.

And that is a big part of why these kinds of calls are so illegal.

Now, when I get a person doing one of these calls on the line, they will usually launch into a fast-talking spiel about how they can save me and my family money on all of our health insurance needs, including Medicare Advantage or Drug Plans, and what would I like to sign up for today?

My reply is always, “Hey, you sell Medicare insurance, too? So do we. My boss is an insurance broker.”

Analogy time: This would be the equivalent of somebody robo-dialing in order to hire a hitman to take out a rival, giving the fully incriminating pitch to whomever answers, and then finding out they’d called the FBI.

When I say this, I can hear the sudden confusion in the silence and the unstated “Oh, shit.” It takes a second or two, but then I hear them hang up on me, and that is the Holy Grail of dealing with these unethical idiots: making them end the call.

Some of them must be paying attention, though, because the other day I got one of these calls during a slow late afternoon, hit 1 to talk to a rep and then instead of immediately being put through, got some hold music, and then after about ten seconds, the call disconnected.

So, other Holy Grail. I think I actually got our office number blocked by a spamming, illegal robo-caller. That’s really satisfying.

However, there’s another trend in these robo-calls that’s somewhat more disturbing on a couple of fronts. First is that it could actually put people out of jobs. And yes, while we all hate these kinds of calls, I still get that for some people, these jobs are their tenuous lifelines. I blame the companies behind them, not the people who have no options other than to work for them.

Second is that this trend is using AI, and it’s getting a lot better. When you get a call that has a voice announcement or is reading off a recorded message, it’s pretty obvious what it is. Beyond the robotic cadence or the message outright stating that it’s a recording, there’s also just a huge difference in sound quality between a recording or digital audio and a live speaker.

Why is this? Simple. Digital or analog audio goes direct through an input line to the headset speaker in your phone. Spoken voice has to take the extra step or traversing a few millimeters of open air between the speaker’s mouth and their microphone, and this creates a completely different quality. You don’t even have to be an audiophile to pick up on it. It’s something we just automatically sense. “Recording” and “Real Person” appear as different from each other as “Mannequin” and “Human Being.”

But then they tweaked the technology, and now I’ve met a couple of AI robo-callers that were obviously filtered to sound like real people with that atmospheric connection. I don’t doubt that this is now a trivial process to add via computer, although to be honest, it could be done really low-tech and in cheap analog by setting up a speaker playing the voice next to a handset picking it up. Either way… these couple of calls got me at first.

Call number one, it was easy to spot after the initial two exchanges, because the voice launched into the uninterruptable spiel so, despite the sound quality, I got it and hung up.

The second and, so far, last time, it was a bit harder. The very human sounding voice started out with, “Hello, how are you today?” I replied, “Fine, and you?” It replied. “Great, thanks for asking. Can I ask you some questions about your family’s shopping habits?” “Sure,” I said, waiting for an opportunity to mess with them, but then also noticed that there seemed to be slightly too long of a pause between their question and my response. Also, every response started with a filler word. And the next response nailed it for me.

“That’s great. Are you responsible for the grocery shopping in your household.”

Trivial thing, but just like we can detect by hearing whether a voice is recorded or on the phone, our brains are also wired to detect whether we’re talking to a human, and this was the point that the bot failed the Turing Test. The responses were a bit mechanical and not keying into my tone at all. So I decided to give it a real test and replied, “I only pay for it, but everyone else decides what they want.”

The pause was slightly longer, and then came the reply, “I’m sorry. I don’t understand. Can you repeat that?” Of course, the human response would have been a laugh at a thing that AI hasn’t mastered yet: A joke.

Bingo, busted bot. So lots of points for the realism of the voice, delivery, and sound quality, but there’s still a long way to go on making it believable, and this is a very, very good thing, indeed. If you think it’s a bot, engage it with non-sequiturs and humor, and see how fast it falls apart.


Image: Alan Turing Memotial by Bernt Rostad, (cc BY 2.0).