Comments

    Unicode nearing 50% of the web

    Unicode nearing 50% of the web: “About 18 months ago, we published a graph showing that Unicode on the web had just exceeded all other encodings of text on the web. The growth since then has been even more dramatic.

    Web pages can use a variety of different character encodings, like ASCII, Latin-1, or Windows 1252 or Unicode. Most encodings can only represent a few languages, but Unicode can represent thousands: from Arabic to Chinese to Zulu. We have long used Unicode as the internal format for all the text we search: any other encoding is first converted to Unicode for processing.

    unicode Unicode nearing 50% of the web
    This graph is from Google internal data, based on our indexing of web pages, and thus may vary somewhat from what other search engines find. However, the trends are pretty clear, and the continued rise in use of Unicode makes it even easier to do the processing for the many languages that we cover.

    Searching for ‘nancials’?
    Unicode is growing both in usage and in character coverage. We recently upgraded to the latest version of Unicode, version 5.2 (via ICU and CLDR). This adds over 6,600 new characters: some of mostly academic interest, such as Egyptian Hieroglyphs, but many others for living languages.

    We’re constantly improving our handling of existing characters. For example, the characters ‘fi’ can either be represented as two characters (‘f’ and ‘i’), or a special display form ‘fi’. A Google search for [financials] or [office] used to not see these as equivalent — to the software they would just look like *nancials and of*ce. There are thousands of characters like this, and they occur in surprisingly many pages on the web, especially generated PDF documents.

    But no longer — after extensive testing, we just recently turned on support for these and thousands of other characters; your searches will now also find these documents. Further steps in our mission to organize the world’s information and make it universally accessible and useful.

    And we’re angling for a party when Unicode hits 50%!

    Posted by Mark Davis, Senior International Software Architect

    10861780 1992311044618323173?l=googleblog.blogspot Unicode nearing 50% of the web
     Unicode nearing 50% of the web  Unicode nearing 50% of the web

     Unicode nearing 50% of the web

    (Via Official Google Blog: Network effects: Introducing the Google Apps ….)

    You Can Get Rid of DSL and Go Wireless

    You Can Get Rid of DSL and Go Wireless: “A new class of devices can translate 3G/4G signals into local wi-fi hotspots, letting you ditch your wired broadband connection and go completely wireless.

     You Can Get Rid of DSL and Go Wireless

    (Via Wired News.)

    Memento: Time Travel for the Web

    Memento: Time Travel for the Web: “

    This National Digital Information Infrastructure and Preservation Program (NDIPP) briefing features the project Memento, presented by Herbert Van de Sompel of the Los Alamos National Laboratory and Michael Nelson from Old Dominion University.

    (Via Library of Congress – Subscriptions.)

    Germanium Laser Brings Optical Computing Closer

    Germanium Laser Brings Optical Computing Closer: “Researchers at MIT have created a germanium laser that’s an important step towards computers that can move data and perform calculations using light instead of electricity. Germanium Laser Brings Optical Computing Closer

    (Via Wired News.)

    How to Ace a TED Talk

    How to Ace a TED Talk: “The TED talk is a unique form: 18 minutes to win over an audience which has already seen it all. Stephen Wolfram gets a standing ovation, an honor not lightly given. He frames the arc of his work from the point of view of his own discovery of how complicated things grow from simple rules and quickly compresses decades of work into a few minutes. How to Ace a TED Talk

    (Via Wired News.)

    Google Voice, explained

    Google Voice, explained: “
    (Cross-posted from the Google Voice Blog)

    Google Voice is about giving you more control over your communications, through dozens of features — ranging from call screening to voicemail transcription to the ability to send and receive SMS by email.

    While we’ve heard from users that they love our growing list of features, we’re conscious of the fact that Google Voice can seem overwhelming [...]

    Google Apps highlights – 2/19/2010

    Google Apps highlights – 2/19/2010: “This is part of a regular series of Google Apps updates that we post every couple of weeks. Look for the label ‘Google Apps highlights‘ and subscribe to the series. – Ed.

    Over the last couple of weeks we’ve been busy adding new functionality to make communicating and sharing with Google Apps easier than ever, whether you use Google Apps [...]

    Microsoft leaves Linux-based FAST customers stranded

    Microsoft leaves Linux-based FAST customers stranded: “

    Buyers of the Linux and UNIX versions of FAST’s Enterprise Search Platform (ESP) got some bad news the other day: It’ll now be necessary to switch to a Windows server platform, or else move to some other product for enterprise search.

    Microsoft Corporation, which as you may recall acquired FAST two years ago for $1.2 billion, has confirmed it [...]

    The Technical Side of PCI DSS

    The Technical Side of PCI DSS: “

    What merchants don’t know about the technical side of protecting customer data can be costly.

    The Payment Card Industry Data Security Standard (PCI DSS) describes 12 system and procedural requirements for securing customer credit card data that is transmitted, processed, or stored by an online merchant.

    In order to accept credit cards as a form of online payment, merchants [...]

    Applying Persuasion to Email Creative

    Applying Persuasion to Email Creative: “

    Persuasion Architecture, developed by conversion optimization gurus Bryan and Jeffrey Eisenberg, is a persona-based approach to marketing. If you know a customer’s ‘buying modality’ (Competitive, Spontaneous, Methodical or Humanistic), you can tailor your design, copy and direct marketing to best persuade that type of personality.

    It would be ideal to segment an email list by personality type. You can apply [...]