This talk has been previously presented at:

  • Sydney Identity and Security Meetup
  • Melbourne Identity and Security Meetup
  • SeaGL 2021
  • Other Emoji Security Talks:

    See also "The Power ⚡️ and Responsibility 😓 of Unicode Adoption ✨", previously presented at: NDC London, DjangoCon US, OSCON, YOW! West, RubyConf AU, Ruby on Rails Oceania, SydPHP, Kiwi PyCon, /dev/world 2016, PyCon AU 2016, Web Directions Code, DjangoCon EU, DevOps Sydney, just to name a few.

    The slides below are from the SeaGL 2021 iteration.

    NoteCVE-2021-42574/Trojan Source was released, but this latest development just re-emphasises that yes, Unicode issues will continue. Read more: Atlassian Advisory, rustc Advisory.

    Hi! I'm Katie. I do a great number of things, but when I'm not changing the world, I enjoy making tapestries, cooking, and seeing just how well various application stacks handle emoji because gosh it's a lot. This is "Expressive Security: Vulnerabilities with Emoji".

    Except it's not.
    It's suppost to be "Expressive Security: Vulnerabilities with Emoji (closed lock with key)"

    But the SeaGL CFP system, sort of, exploded when I tried to do that.
    The fork of OSEM (Open Source Event Manager) that SeaGL uses gave me a LOVELY 500 error trying to submit this talk. But it's okay, it's not just y'all. This problem happens everywhere.

    But first, let me go back a bit.
    TL;DR Because computers, we have this thing called Unicode.
    And Unicode gives us a huge address space to encode over a million characters to allow every character of near every written human language to be encoded. But as this standard grew, more than just alphabets were added.
    The very early versions of Unicode had all three of the main writing systems used in Japan, that is hiragana, katakana and kanji, but not their unofficial "fourth" set.
    Originally implemented in different ways by different telcos in the 90's, the space left
    in the character encoding systems for japanese mobile phones were used to store
    some cute little pictures
    And these symbols were very popular, but often implemented differently between
    different phone vendors.
    These symbols were standardized in 2007 by putting them into unicode.
    And nothing happened.
    2010 saw the first iPhones enter japan, which to be able to compete the iPhone had
    to support emoji, and so Apple implemented it specifically for that region.
    And, eventually, the feature flag for this feature appeared on iPhones in the US.
    Story goes, one day, someone found this feature, hidden in a custom setting in a
    third-party app, and went: huh.. I can send a pile of poop to my friends.
    And thus the precambrian explosion of emoji started.
    Every year we get more and more emoji, addressing memes and frequently
    requested emoji.
    We start getting backwards compatibility, like the cowboy emoji. This emoji was added
    to Unicode because it was an emoticon in Yahoo! Messenger.
    We start getting communities putting together proposals and designs, like the fortune
    Every year, even more emoji are added, as they are approved by the Emoji
    Subcommittee. The submission process is open to the public, which means
    anywhoever can suggest emoji, and if the submission is strong enough, they might
    get accepted.
    Like me. The parrot was me. Sorry.
    Then we start getting more inclusive, like representation of bionics.
    And more diverse food and animals.
    And the set still isn't 'complete', as we are getting things that we've been missing the
    entire time, like a single coin emoji. (Also, me. Sorry)
    The list for 2021 has been announced, but the images won't be available on your
    devices until probably next year.
    These images here are *sample* implementations, but vendors -- Apple for iOS,
    Google for Android, Microsoft for Windows, etc -- are completely free to choose
    whatever implementation they want.
    Which becomes an issue when the implementations don't match.
    Let me show you an example. Sadly, seagull isn't an emoji, but the parrot is.
    These are all codepoint 1F99C Parrot, but depending on your platform you're either
    ● a scarlet macaw, or anything from
    ● a carolina parakeet to
    ● possibly a Black-winged lovebird
    ● to an eclectus parrot, or
    ● even a sun parakeet.
    While these are all parrots, they're all the wrong parrot.
    None of them look like Sirocco the kākāpō, for which my original parrot emoji
    submission was based.
    You might recognise him as the Party Parrot.
    Party all the time!

    Emoji implementers are under no obligation to use the suggested design in the
    submission docs, so you end up with these variations.
    But it's not just issues with between platforms. The issues can be on the same platform between different codepoints.
    The landmark case is of the these two emoji, subject of the 2016 paper "“Blissfully
    happy” or “ready to fight”: Varying Interpretations of Emoji" from the University of
    Before 2015 on iPhones, the only difference between these two emoji where the eyes.
    In the west we might look to the mouth to work out the emotion, but particularly in
    Manga, the eyes are used.
    In 2016 the Beaming Face with Smiling Eyes emoji was updated to have also a
    smiling mouth.
    As the years go on, the emoji representations that vendors service can be updated.
    The apple emoji from 10 years ago look similar to the emoji today, but are now much
    more detailed.
    Suffice to say... I'm an enormous emoji nerd.
    I've got *multiple* emoji into the standard. I've got published articles about emoji.
    This is all because 5 years ago I started giving talks on it.
    Since 2016, I've given this talk:
    The Power and Responsibility of Unicode Adoption
    Except, that's not the title of the talk.
    The actual title is
    The Power (lightning bolt)
    And responsibility (face with cold sweat)
    of Unicode Adoption (sparkles)
    And the very first time I gave this talk, I had problems.
    My first acceptance email for this presentation had the title of the talk just, truncated.
    "Congratulations! Your talk has been selected! Selected talk: The Power Lightning
    Bolt and Responsibility".
    ... huh.
    Some systems wouldn't let me enter the talk title in the first place, and tell me that the
    problem will be 'solved soon'.
    I should hope it was solved, as EasyChair was the CFP system for the 4th
    International Workshop on Emoji Understanding and Applications in Social Media,
    held earlier this year.
    Some conferences get it right in some places, but wrong in others.
    For example for this conference, the title displays correctly on the website schedule..
    .. but on the printed schedule, there's just a void.
    An the AV runsheet, it was giant black boxes everywhere.
    You'll note the "Check Title" note;
    The Session chair for this talk had yet another printout which she, with consent,
    repeated verbatim, so I was introduced as speaking about...
    "The Power (lightning bolt) and responsibility (square) of unicode adoption (square)"
    But even better was the digital display the conference center had outside my talk.
    It worked.. mostly.
    You can see the emoji are there, but this is the segue into the security part of this talk.
    I can tell you, by looking at this picture alone, that this conference center was running
    Windows 8.0.
    And I can tell because of the second emoji.
    This is the "Face with Cold Sweat" emoji, as represented on Windows 8.0.
    For particular codepoints that get updated often, using the knowledge about vendor
    updates we learnt earlier, you can pin down to the minor version the operating
    The next version of this emoji may or may not be flat or 3d shaded. Depending on
    which one comes out on the retail version of Windows, you might be able to tell who's
    running a preview.
    This presumes that you're running the base operating system with no customisations.
    Especially if you are running linux you may have already experienced the issue in the
    lack of richness of your emoji, and there are various ways that you can install custom
    fonts to adjust this.
    But it's not just humans that misunderstand emoji.
    Computers are especially terrible with certain Unicode.
    This is the Pride Emoji.
    It's technically not it's own emoji, it's a combination of emoji.
    It's the combination of the blank flag emoji, a rainbow emoji, and between the two, a
    special character called a "zero width joiner", allegedly pronounced "zwidge"(?)
    The Zwidge also the hidden emoji that allows you to have skin tone modifiers on
    various hand and face emoji.
    When this functionality was added to emoji back in 2016, it caused some FUN.
    For instance! If instead of this sequence you sent...
    you sent this literally sequence: flag, digit zero, rainbow
    to your mate with an iPhone,
    it'd do this.
    This isn't the first and won't be the last time that unique sequences of characters
    crash mobile devices.
    You can hide a lot of non-display characters in a message that a copy-paste allows it
    to be shared by mobile users that the phone will just nope out of
    But it's not just emoji, combinatorics of scripts like arabic and telugu (teh-luh-goo)
    have been reported to cause issues.
    These are just some examples of combinations that have rebooted enough phones to
    get news coverage.
    The biggest example of these issues coming together happened a few years ago
    now, but it's still a super interesting story.
    The link on the screen goes to a great full write up of the issue, including a link to the
    loopconf recording from the lead dev of wordpress, and a post from the original
    vulnerability identifier.
    This is all about CVE 2015 3438.
    This isn't a problem specific to wordpress, but given it did happen to wordpress and
    wordpress powers a significant percentage of websites, it's not too outlandish to be
    able to say have in my abstract "we'll cover how one WordPress fix saved a quarter of
    the internet from XSS through emoji"
    The Wordpress 4.2 blog post noted that this release features "extended character
    support", noting "Emoji are now available in WordPress!" (highlighting mine)
    Back in 2015 this was a huge deal. You could use emoji in wordpress if you just
    update your installation? Our users will really like that! let's do it!
    "Install updates and get new emoji" is actually a really engaging model for making
    people do updates. "Update and get the new emoji! You'll also get many important
    security updates, but also, new emoji!"
    Versions prior to this, though, with particular database setups were susceptible to
    cross-site-scripting using emoji as the vector.
    I'm going to show you how this works, with a smile.
    Specifically this smile.
    This smile is not your average smile.

    This is codepoint 1F642 and is the slightly smiling face.
    It was introduced in Unicode 7.0 back in 2014, and it will work for our purposes.
    What won't work is other smiling faces.
    This face won't work. It looks far to good of heart to participate in such a mad hacks.

    It's the "white" smiling face, emoji style.
    That FE0F at the end is the variant selector 16 character which when appended to a
    codepoint that predates emoji askes the system to render it as emoji.
    If you don't specify this:
    You get the text version of the emoji.
    Another fun fact: If you've ever received an email with a random capital J at the end of
    a sentence; that's from someone sending you email in Microsoft Office using the
    Wingdings font, which has a smiley face for the Capital J codepoint.
    I will also note that this is the "white" smiling face because
    The next codepoint up is the Black smiling face.
    They're called white and black in the unicode standard, where it's more "unfilled" and
    Back to this smiling face.
    This codepoint starts with a 1F and has 5 characters in it's codepoint, as opposed to
    our other smilies which have 4 characters.
    The encoding we're using here makes it seem as though it's one character, but if we
    expand it out
    It's actually a multibyte character
    Which is the critical part of getting this hack to work.
    Let me show you.
    Say you're on a blog and you're going to post a comment. Like a nice person.
    And so you type your comment. It's such a happy comment; you're engaging with the
    author, providing positive feedback on their work, all good.
    But if the setup allows, you end up getting this posted:
    This is such a great post!
    but where is the rest of the comment that I entered?
    Gosh, I should make sure that's added.
    Here we go, let's make sure we get that quote in there.
    And posted. Great :)
    But if we mouse over the page
    We've injected javascript into the application.
    Let's look at what went wrong.
    This is what a section of the rendered page would look like.

    You'll note the copious amounts of green that aren't highlighted.

    And you'll also note the lack of the smiling face we originally entered.
    A lot of websites will have data validation that will disallow script tags and other
    But we can get through some of this by submitting our data in multiple comments.
    In this example we're also taking advantage of the use of different quotations marks to
    effectively escape a bunch of HTML
    But the biggest part of this is the truncation of our content. That's where the emoji
    comes in.
    What should have been an INSERT into table with our entire comment
    but what's actually happening it that just truncates the rest of the inserted value at the
    first multibyte character.
    This is a behaviour of mysql if you don't have strict mode enabled and are using an
    encoding that didn't support multibyte characters, mysql would just.. stop when it got
    to the strange codepoint.
    So as this cascades back you can insert multiple comments that insert script tags and
    other nonsense that would bypass conventional filters for the literal "script" tag.
    If you're using MySQL, make sure you're using utf8mb4 character set, which supports
    4-byte utf-8 encoding to avoid this issue.
    Also set strict_all_tables to make mysql error if it gets any of these sort of bad values,
    where the data requested to be inserted doesn't match the actual data inserted.
    There already several bugs both in the SeaGL fork and the original OSEM software
    from other speakers reporting issues submitting emoji in their talks
    If you're a Rails dev, maybe give this bug a go? There's advise in those bugs about
    how such migration works in Rails 5, specifically with changing the encoding of the
    specific fields where emoji would be useful.
    Also, a useful lil app to help you with your emoji poking is this.
    Regardless of which system you're using, try adding the odd emoji in it.
    Adding a single emoji and confirming it's correctly written and displayed back can
    confirm an entire pipeline of unicode conformance, that in theory, also means your
    system will support all the other parts of unicode, like your users who use character
    sets outside of the basic latin alphabet.
    Thank you so much for watching!
    All the resources for today's presentation, including the slides, are available at
    Have a great rest of your day