Spatial embedding

Spatial embedding

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension. Such embedding methods allow complex spatial data to be used in neural networks and have been shown to improve performance in spatial analysis tasks == Embedded data types == Geographic data can take many forms: text, images, graphs, trajectories, polygons. Depending on the task, there may be a need to combine multimodal data from different sources. The next section describes examples of different types of data and their uses. === Text === Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques. === Image === Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region. === Point === A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph. === Line / multiline === Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization. === Polygon === The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents. === Graph === An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network. == Usage == POI recommendation - generating personalized point of interest recommendations based on user preferences. Next/future location prediction - prediction of the next location a person will go to based on their historical trajectory. Zone functions classification - based on different mobility of people or POI distribution a function of a given area in a city can be predicted. Crime prediction - estimation of crime rate in different regions of a city. Local event detection - studying spatio-temporal changes in embeddings can provide valuable information in detection of local event occurring in specific location. Regional mobility popularity prediction - analysis of mobility can show patterns in popularity of different regions in a city. Shape matching - finding a similar shape of given polygon, for example finding building with the same shape as input building. Travel time estimation - predicting estimated travel time given current traffic conditions and special occurring events. Time estimation for on-demand food delivery - estimation of delivery time when placing an order through the website. == Temporal aspect == Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other. Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region. In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.

Image translation

Image translation is the machine translation of images of printed text (posters, banners, menus, screenshots etc.). This is done by applying optical character recognition (OCR) technology to an image to extract any text contained in the image, and then have this text translated into a language of their choice, and the applying digital image processing on the original image to get the translated image with a new language. == General == Machine translation made available on the internet (web and mobile) is a notable advance in multilingual communication eliminating the need for an intermediary translator/interpreter, translating foreign texts still poses a problem to the user as they cannot be expected to be able to type the foreign text they wish to translate and understand. Manually entering the foreign text may prove to be a difficulty especially in cases where an unfamiliar alphabet is used from a script which user can't read, e.g. Cyrillic, Chinese, Japanese etc. for an English speaker or any speaker of a Latin-based language or vice versa. The technical advancements in OCR made it possible to recognize text from images. The possibility to use one's mobile device's camera to capture and extract printed text is also known as mobile OCR and was first introduced in Japanese manufactured mobile telephones in 2004. Using the handheld's camera one could take a picture of (a line of) text and have it extracted (digitalized) for further manipulation such as storing the information in their contacts list, as a web page address (URL) or text to use in an SMS/email message etc. Presently, mobile devices having a camera resolution of 2 megapixels or above with an auto-focus ability, often feature the text scanner service. Taking the text scanning facility one step further, image translation emerged, giving users the ability to capture text with their mobile phone's camera, extract the text, and have it translated in their own language. More and more applications emerged on this technology including Word Lens. After getting acquired by Google, it was made a part of Google Translate mobile app. Another simultaneous advancement in Image Processing, has also made it possible now to replace the text on the image with the translated text and create a new image altogether. == History == The development of the image translation service springs from the advances in OCR technology (miniaturization and reduction of memory resources consumed) enabling text scanning on mobile telephones. Among the first to announce mobile software capable of “reading” text using the mobile device's camera is International Wireless Inc. who in February 2003 released their “CheckPoint” and “WebPoint” applications. “CheckPoint” reads critical symbolic information on checks and is aimed at reducing losses that mobile merchants suffer from “bounced” checks by scanning the MICR number on the bottom of a check, while “WebPoint” enables the visual recognition and decoding of printed URL's, which are then opened by the device's web browser. The first commercial release of a mobile text scanner, however, took place in December 2004 when Vodafone and Sharp began selling the 902SH mobile which was the first to feature a 2 megapixel digital camera with optical zoom. Among the device's various multimedia features was the built-in text/bar code/QR code scanner. The text scanner function could handle up to 60 alphabetical characters simultaneously. The scanned text could be then sent as an email or SMS message, added as a dictionary entry or, in the case of scanned URLs, opened via the device's web browser. All subsequent Sharp mobiles feature the text scanner functionality. In September 2005, NEC Corporation and the Nara Institute of Science and Technology in Japan (NAIST) announced new software capable of transforming cameraphones into text scanners. The application differs substantially from similarly equipped mobile telephones in Japan (able to scan businesscards and small bits of text and use OCR to convert that to editable text or to URL addresses) by it ability to scan a whole page. The two companies, however, said they would not release the software commercially before the end of 2008. Combining the text scanner function with machine translation technology was first made by US company RantNetwork who in July 2007 started selling the Communilator, a machine translation application for mobile devices featuring the Image Translation functionality. Using the built-in camera, the mobile user could take a picture of some printed text, apply OCR to recognize the text and then translate it into any one of over 25 language available. In April 2008 Nokia showcased their Shoot-to-Translate application for the N73 model which is capable of taking a picture using the device's camera, extracting the text and then translating it. The application only offers Chinese to English translation, and does not handle large segments of text. Nokia said they are in the process of developing their Multiscanner product which, besides scanning text and business cards, would be able to translate between 52 languages. Again in April 2008, Korean company Unichal Inc. released their handheld Dixau text scanner capable of scanning and recognizing English text and then translating it into Korean using online translation tools such as Wikipedia or Google Translate. The device is connected to a PC or a laptop via the USB port. In February 2009, Bulgarian company Interlecta presented at the Mobile World Congress in Barcelona their mobile translator including image recognition and speech synthesis. The application handles all European languages along with Chinese, Japanese and Korean. The software connects to a server over the Internet to accomplish the image recognition and the translation. In May 2014, Google acquired Word Lens to improve the quality of visual and voice translation. It is able to scan text or picture with one's device and have it translated instantly. Since the OCR has been improving many companies or website started combining OCR and translation, to read the text from an image and show the translated text. In August 2018, an Indian company created ImageTranslate. It is able to read, translate and re-create the image in another language. As of late 2018, the tool added 13 new languages, including Arabic, Thai, Vietnamese, Hindi, and Bengali, significantly increasing its utility in Asia and the Middle East. This helps users translate photos already stored in their phone's gallery, not just live, real-time views. Currently, image translation is offered by the following companies: Google Translate app with camera ImageTranslate Yandex

Malleability (cryptography)

Malleability is a property of some cryptographic algorithms. An encryption algorithm is said to be malleable if it is possible to transform a ciphertext into another ciphertext which decrypts to a related plaintext. That is, given an encryption of a plaintext m {\displaystyle m} , it is possible to generate another ciphertext which decrypts to f ( m ) {\displaystyle f(m)} , for a known function f {\displaystyle f} , without necessarily knowing or learning m {\displaystyle m} . Malleability is often an undesirable property in a general-purpose cryptosystem, since it allows an attacker to modify the contents of a message. For example, suppose that a bank uses a stream cipher to hide its financial information, and a user sends an encrypted message containing, say, "TRANSFER $0000100.00 TO ACCOUNT #199." If an attacker can modify the message on the wire, and can guess the format of the unencrypted message, the attacker could change the amount of the transaction, or the recipient of the funds, e.g. "TRANSFER $0100000.00 TO ACCOUNT #227". Malleability does not refer to the attacker's ability to read the encrypted message. Both before and after tampering, the attacker cannot read the encrypted message. On the other hand, some cryptosystems are malleable by design. In other words, in some circumstances it may be viewed as a feature that anyone can transform an encryption of m {\displaystyle m} into a valid encryption of f ( m ) {\displaystyle f(m)} (for some restricted class of functions f {\displaystyle f} ) without necessarily learning m {\displaystyle m} . Such schemes are known as homomorphic encryption schemes. A cryptosystem may be semantically secure against chosen-plaintext attacks or even non-adaptive chosen-ciphertext attacks (CCA1) while still being malleable. However, security against adaptive chosen-ciphertext attacks (CCA2) is equivalent to non-malleability. == Example malleable cryptosystems == In a stream cipher, the ciphertext is produced by taking the exclusive or of the plaintext and a pseudorandom stream based on a secret key k {\displaystyle k} , as E ( m ) = m ⊕ S ( k ) {\displaystyle E(m)=m\oplus S(k)} . An adversary can construct an encryption of m ⊕ t {\displaystyle m\oplus t} for any t {\displaystyle t} , as E ( m ) ⊕ t = m ⊕ t ⊕ S ( k ) = E ( m ⊕ t ) {\displaystyle E(m)\oplus t=m\oplus t\oplus S(k)=E(m\oplus t)} . In the RSA cryptosystem, a plaintext m {\displaystyle m} is encrypted as E ( m ) = m e mod n {\displaystyle E(m)=m^{e}{\bmod {n}}} , where ( e , n ) {\displaystyle (e,n)} is the public key. Given such a ciphertext, an adversary can construct an encryption of m t {\displaystyle mt} for any t {\displaystyle t} , as E ( m ) ⋅ t e mod n = ( m t ) e mod n = E ( m t ) {\textstyle E(m)\cdot t^{e}{\bmod {n}}=(mt)^{e}{\bmod {n}}=E(mt)} . For this reason, RSA is commonly used together with padding methods such as OAEP or PKCS1. In the ElGamal cryptosystem, a plaintext m {\displaystyle m} is encrypted as E ( m ) = ( g b , m A b ) {\displaystyle E(m)=(g^{b},mA^{b})} , where ( g , A ) {\displaystyle (g,A)} is the public key. Given such a ciphertext ( c 1 , c 2 ) {\displaystyle (c_{1},c_{2})} , an adversary can compute ( c 1 , t ⋅ c 2 ) {\displaystyle (c_{1},t\cdot c_{2})} , which is a valid encryption of t m {\displaystyle tm} , for any t {\displaystyle t} . In contrast, the Cramer-Shoup system (which is based on ElGamal) is not malleable. In the Paillier, ElGamal, and RSA cryptosystems, it is also possible to combine several ciphertexts together in a useful way to produce a related ciphertext. In Paillier, given only the public key and an encryption of m 1 {\displaystyle m_{1}} and m 2 {\displaystyle m_{2}} , one can compute a valid encryption of their sum m 1 + m 2 {\displaystyle m_{1}+m_{2}} . In ElGamal and in RSA, one can combine encryptions of m 1 {\displaystyle m_{1}} and m 2 {\displaystyle m_{2}} to obtain a valid encryption of their product m 1 m 2 {\displaystyle m_{1}m_{2}} . Block ciphers in the cipher block chaining mode of operation, for example, are partly malleable: flipping a bit in a ciphertext block will completely mangle the plaintext it decrypts to, but will result in the same bit being flipped in the plaintext of the next block. This allows an attacker to 'sacrifice' one block of plaintext in order to change some data in the next one, possibly managing to maliciously alter the message. This is essentially the core idea of the padding oracle attack on CBC, which allows the attacker to decrypt almost an entire ciphertext without knowing the key. For this and many other reasons, a message authentication code is required to guard against any method of tampering. == Complete non-malleability == Fischlin, in 2005, defined the notion of complete non-malleability as the ability of the system to remain non-malleable while giving the adversary additional power to choose a new public key which could be a function of the original public key. In other words, the adversary shouldn't be able to come up with a ciphertext whose underlying plaintext is related to the original message through a relation that also takes public keys into account.

Knapsack problem

The knapsack problem is the following problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision-makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively. The knapsack problem has been studied for more than a century, with early works dating back to 1897. The subset sum problem is a special case of the decision and 0-1 problems where for each kind of item, the weight equals the value: w i = v i {\displaystyle w_{i}=v_{i}} . In the field of cryptography, the term knapsack problem is often used to refer specifically to the subset sum problem. The subset sum problem is one of Karp's 21 NP-complete problems. == Applications == Knapsack problems appear in real-world decision-making processes in a wide variety of fields, such as finding the least wasteful way to cut raw materials, selection of investments and portfolios, selection of assets for asset-backed securitization, and generating keys for the Merkle–Hellman and other knapsack cryptosystems. One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. For small examples, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values, it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score. A 1999 study of the Stony Brook University Algorithm Repository showed that, out of 75 algorithmic problems related to the field of combinatorial algorithms and algorithm engineering, the knapsack problem was the 19th most popular and the third most needed after suffix trees and the bin packing problem. == Definition == The most common problem being solved is the 0-1 knapsack problem, which restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to zero or one. Given a set of n {\displaystyle n} items numbered from 1 up to n {\displaystyle n} , each with a weight w i {\displaystyle w_{i}} and a value v i {\displaystyle v_{i}} , along with a maximum weight capacity W {\displaystyle W} , maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 } {\displaystyle x_{i}\in \{0,1\}} . Here x i {\displaystyle x_{i}} represents the number of instances of item i {\displaystyle i} to include in the knapsack. Informally, the problem is to maximize the sum of the values of the items in the knapsack so that the sum of the weights is less than or equal to the knapsack's capacity. The bounded knapsack problem (BKP) removes the restriction that there is only one of each item, but restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to a maximum non-negative integer value c {\displaystyle c} : maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 , 2 , … , c } . {\displaystyle x_{i}\in \{0,1,2,\dots ,c\}.} The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item and can be formulated as above except that the only restriction on x i {\displaystyle x_{i}} is that it is a non-negative integer. maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ N . {\displaystyle x_{i}\in \mathbb {N} .} One example of the unbounded knapsack problem is given using the figure shown at the beginning of this article and the text "if any number of each book is available" in the caption of that figure. == Computational complexity == The knapsack problem is interesting from the perspective of computer science for many reasons: The decision problem form of the knapsack problem (Can a value of at least V be achieved without exceeding the weight W?) is NP-complete, thus there is no known algorithm that is both correct and fast (polynomial-time) in all cases. There is no known polynomial algorithm which can tell, given a solution, whether it is optimal (which would mean that there is no solution with a larger V). This problem is co-NP-complete. There is a pseudo-polynomial time algorithm using dynamic programming. There is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine, described below. Many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly. There is a link between the "decision" and "optimization" problems in that if there exists a polynomial algorithm that solves the "decision" problem, then one can find the maximum value for the optimization problem in polynomial time by applying this algorithm iteratively while increasing the value of k. On the other hand, if an algorithm finds the optimal value of the optimization problem in polynomial time, then the decision problem can be solved in polynomial time by comparing the value of the solution output by this algorithm with the value of k. Thus, both versions of the problem are of similar difficulty. One theme in research literature is to identify what the "hard" instances of the knapsack problem look like, or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests. The goal in finding these "hard" instances is for their use in public-key cryptography systems, such as the Merkle–Hellman knapsack cryptosystem. More generally, better understanding of the structure of the space of instances of an optimization problem helps to advance the study of the particular problem and can improve algorithm selection. Furthermore, notable is the fact that the hardness of the knapsack problem depends on the form of the input. If the weights and profits are given as integers, it is weakly NP-complete, while it is strongly NP-complete if the weights and profits are given as rational numbers. However, in the case of rational weights and profits it still admits a fully polynomial-time approximation scheme. === Unit-cost models === The NP-hardness of the Knapsack problem relates to computational models in which the size of integers matters (such as the Turing machine). In contrast, decision trees count each decision as a single step. Dobkin and Lipton show an 1 2 n 2 {\displaystyle {1 \over 2}n^{2}} lower bound on linear decision trees for the knapsack problem, that is, trees where decision nodes test the sign of affine functions. This was generalized to algebraic decision trees by Steele and Yao. If the elements in the problem are real numbers or rationals, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor"). This model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions. An upper bound for a decision-tree model was given by Meyer auf der Heide who showed that for every n there exists an O(n4)-deep linear decision tree that solves the subset-sum problem with n items. Note that this does not imply any upper bound for an algorithm that should solve the problem for any given n. == Solving == Several algorithms are available to solve knapsack problems, based on the dynamic programming approach, the branch and bound approach or hybridizations of both approaches. === Dynamic programming in-advance algorithm === The unbounded knapsack problem (UKP) places no restriction on the number of copies of each kind of item. Besides, here we assume that x i > 0 {\displaystyle x_{i}>0} m [ w ′ ] = max ( ∑ i = 1 n v i x i ) {\displaystyle m[w']=\max \left(\sum _{i=1}^{n}v_{i}x_{i}\right)} subject to ∑

Social profiling

Social profiling is the process of constructing a social media user's profile using their social data. In general, profiling refers to the data science process of generating a person's profile with computerized algorithms and technology. There are various platforms for sharing this information with the proliferation of growing popular social networks, including but not limited to LinkedIn, Google+, Facebook and Twitter. == Social profile and social data == A person's social data refers to the personal data that they generate either online or offline (for more information, see social data revolution). A large amount of these data, including one's language, location and interest, is shared through social media and social network. Users join multiple social media platforms and their profiles across these platforms can be linked using different methods to obtain their interests, locations, content, and friend list. Altogether, this information can be used to construct a person's social profile. Meeting the user's satisfaction level for information collection is becoming more challenging. This is because of too much "noise" generated, which affects the process of information collection due to explosively increasing online data. Social profiling is an emerging approach to overcome the challenges faced in meeting user's demands by introducing the concept of personalized search while keeping in consideration user profiles generated using social network data. A study reviews and classifies research inferring users social profile attributes from social media data as individual and group profiling. The existing techniques along with utilized data sources, the limitations, and challenges were highlighted. The prominent approaches adopted include machine learning, ontology, and fuzzy logic. Social media data from Twitter and Facebook have been used by most of the studies to infer the social attributes of users. The literature showed that user social attributes, including age, gender, home location, wellness, emotion, opinion, relation, influence are still need to be explored. === Personalized meta-search engines === The ever-increasing online content has resulted in the lack of proficiency of centralized search engine's results. It can no longer satisfy user's demand for information. A possible solution that would increase coverage of search results would be meta-search engines, an approach that collects information from numerous centralized search engines. A new problem thus emerges, that is too much data and too much noise is generated in the collection process. Therefore, a new technique called personalized meta-search engines was developed. It makes use of a user's profile (largely social profile) to filter the search results. A user's profile can be a combination of a number of things, including but not limited to, "a user's manual selected interests, user's search history", and personal social network data. == Social media profiling == According to Samuel D. Warren II and Louis Brandeis (1890), disclosure of private information and the misuse of it can hurt people's feelings and cause considerable damage in people's lives. Social networks provide people access to intimate online interactions; therefore, information access control, information transactions, privacy issues, connections and relationships on social media have become important research fields and are subjects of concern to the public. Ricard Fogues and other co-authors state that "any privacy mechanism has at its base an access control", that dictate "how permissions are given, what elements can be private, how access rules are defined, and so on". Current access control for social media accounts tend to still be very simplistic: there is very limited diversity in the category of relationships on for social network accounts. User's relationships to others are, on most platforms, only categorized as "friend" or "non-friend" and people may leak important information to "friends" inside their social circle but not necessarily users to they consciously want to share the information to. The below section is concerned with social media profiling and what profiling information on social media accounts can achieve. === Privacy leaks === A lot of information is voluntarily shared on online social networks, such as photos and updates on life activities (new job, hobbies, etc.). People rest assured that different social network accounts on different platforms will not be linked as long as they do not grant permission to these links. However, according to Diane Gan, information gathered online enables "target subjects to be identified on other social networking sites such as Foursquare, Instagram, LinkedIn, Facebook and Google+, where more personal information was leaked". The majority of social networking platforms use the "opt out approach" for their features. If users wish to protect their privacy, it is user's own responsibility to check and change the privacy settings as a number of them are set to default option. A major social network platforms have developed geo-tag functions and are in popular usage. This is concerning because 39% of users have experienced profiling hacking; 78% burglars have used major social media networks and Google Street-view to select their victims; and an astonishing 54% of burglars attempted to break into empty houses when people posted their status updates and geo-locations. === Facebook === Formation and maintenance of social media accounts and their relationships with other accounts are associated with various social outcomes. In 2015, for many firms, customer relationship management is essential and is partially done through Facebook. Before the emergence and prevalence of social media, customer identification was primarily based upon information that a firm could directly acquire: for example, it may be through a customer's purchasing process or voluntary act of completing a survey/loyalty program. However, the rise of social media has greatly reduced the approach of building a customer's profile/model based on available data. Marketers now increasingly seek customer information through Facebook; this may include a variety of information users disclose to all users or partial users on Facebook: name, gender, date of birth, e-mail address, sexual orientation, marital status, interests, hobbies, favorite sports team(s), favorite athlete(s), or favorite music, and more importantly, Facebook connections. However, due to the privacy policy design, acquiring true information on Facebook is no trivial task. Often, Facebook users either refuse to disclose true information (sometimes using pseudonyms) or setting information to be only visible to friends, Facebook users who "LIKE" your page are also hard to identify. To do online profiling of users and cluster users, marketers and companies can and will access the following kinds of data: gender, the IP address and city of each user through the Facebook Insight page, who "LIKED" a certain user, a page list of all the pages that a person "LIKED" (transaction data), other people that a user follow (even if it exceeds the first 500, which we usually can not see) and all the publicly shared data. === Twitter === First launched on the Internet in March 2006, Twitter is a platform on which users can connect and communicate with any other user in just 280 characters. Like Facebook, Twitter is also a crucial tunnel for users to leak important information, often unconsciously, but able to be accessed and collected by others. According to Rachel Nuwer, in a sample of 10.8 million tweets by more than 5,000 users, their posted and publicly shared information are enough to reveal a user's income range. A postdoctoral researcher from the University of Pennsylvania, Daniel Preoţiuc-Pietro and his colleagues were able to categorize 90% of users into corresponding income groups. Their existing collected data, after being fed into a machine-learning model, generated reliable predictions on the characteristics of each income group. The mobile app called Streamd.in displays live tweets on Google Maps by using geo-location details attached to the tweet, and traces the user's movement in the real world. === Profiling photos on social network === The advent and universality of social media networks have boosted the role of images and visual information dissemination. Many types of visual information on social media transmit messages from the author, location information and other personal information. For example, a user may post a photo of themselves in which landmarks are visible, which can enable other users to determine where they are. In a study done by Cristina Segalin, Dong Seon Cheng and Marco Cristani, they found that profiling user posts' photos can reveal personal traits such as personality and mood. In the study, convolutional neural networks (CNNs) is introduced. It builds on the main characteristics of computational

GoodRx

GoodRx Holdings, Inc. is an American healthcare company that operates a telemedicine platform and free-to-use website and mobile app that track prescription drug prices in the United States and provide drug coupons for discounts on medications. GoodRx compares prescription drug prices at more than 75,000 pharmacies in the United States. The platform allows users to consult a doctor online and obtain a prescription for certain types of medications. == History == === Financial performance === GoodRx was founded in Santa Monica, California in 2011. GoodRx experienced substantial growth in net income in 2017 ($9 million), 2018 ($44 million), and 2019 ($66 million), but recorded a loss of $293.6 million in 2020 due to IPO-related expenses. In September 2020, GoodRx went public on the Nasdaq under the ticker symbol GDRX. The company priced its initial public offering at $33 per share, above the expected range of $24 to $28, raising more than $1.1 billion at an initial valuation of approximately $12.7 billion. In the first half of 2020, the company reported revenues of $257 million and net income of $55 million. GoodRx generated $745.4 million in revenue for the full year 2021, a 35.36% increase over 2020. During the first half of 2021, the company’s share price declined by 10.7%. The decline was attributed to increased competition in online pharmacy services and slower user growth. GoodRx reported full-year revenue of $766.6 million, with adjusted EBITDA reaching $213.5 million, exceeding guidance in the fourth quarter. GoodRx reported that 41% of prescriptions filled using its coupons were newly adherent, meaning they would not have been filled without the service. GoodRx reported a full-year 2023 revenue of $750.3 million, a decrease of 2.1% from 2022. However, its fourth-quarter revenue increased by 7% year-over-year. GoodRx achieved an Adjusted EBITDA of $217.4 million for the year and an Adjusted EBITDA Margin of 28.6%. In 2024, GoodRx achieved 6% revenue growth with $792.3 million for the full year and turned a net loss into a positive net income of $16.4 million. The company also demonstrated strong operational efficiency, with a 32.8% increase in full-year Adjusted EBITDA. In Q2 2025, GoodRx reported revenue of $203.1 million, a 1.2% increase from the previous year, and a net income of $12.8 million, a significant 92% jump, which resulted in a 6.3% net income margin. However, prescription transaction revenue declined by 3% due to a decrease in monthly active consumers, but this was offset by strong 32% growth in its Pharma Manufacturer Solutions business. GoodRx also saw a 7% decrease in subscription revenue. === Mergers and acquisitions === In 2019, GoodRx acquired HeyDoctor, a telemedicine company, to integrate virtual healthcare services into the platform. In 2021, a health video content producer, HealthiNation was acquired by GoodRx, which helped provide consumers with health information and offered pharmaceutical manufacturers new ways to reach relevant audiences. In April 2022, GoodRx acquired VitaCare Prescription Services from TherapeuticsMD to strengthen its pharma manufacturer solutions business. === Partnerships === In 2017, the company announced partnerships with major pharmaceutical companies to negotiate lower prescription drug costs. GoodRx has deep relationships with major pharmacy chains, including Walgreens, Walmart, CVS Caremark, and Publix, to allow customers to use GoodRx discounts and Gold benefits. GoodRx began its partnership with CVS Caremark in July 2023 to automatically apply coupons to insured CVS customers purchasing generic prescriptions at certain locations. In April 2024, GoodRx added Publix into its network, allowing GoodRx Gold members to use their cards at Publix Pharmacies. GoodRx partners with Pharmacy Benefit Management like Caremark, Express Scripts, and MedImpact to apply their savings directly to eligible insurance plans and members. GoodRx partners with companies like Affirm, Benefitfocus, and DoorDash to integrate their services that offer members discounts and financial flexibility for prescriptions. GoodRx also partners with organizations like the American Academy of Family Physicians Foundation to support broader access to care. In October 2022, GoodRx launched Provider Mode, which allows healthcare providers to use the app to compare costs of drugs for patients based on different payment methods and drug alternatives. In 2025, GoodRx partnered with Novo Nordisk to offer discounted cash-pay access to semaglutide products like Ozempic and Wegovy through its platform and participating pharmacies. == Products and services == GoodRx started its telemedicine service GoodRx Care in September 2019. It lets people talk to a licensed provider online for common issues and get prescriptions even if they don't have insurance. They also run condition-specific subscription plans that bundle online doctor visits, FDA-approved meds, and home delivery into one monthly payment. On the weight management side, GoodRx offers prescriptions for GLP-1 drugs like semaglutide through their telemedicine platform. This got a boost when the oral version of Wegovy became widely available in the US in early 2026. GoodRx works with drug makers like Novo Nordisk to make some medications (including semaglutide options) more affordable for people paying cash. The telemedicine part took off after GoodRx bought HeyDoctor in 2019 and brought their virtual care tools into the main platform. == Key people == The Santa Monica-based startup was founded in September 2011 by Trevor Bezdek and former Facebook executives Doug Hirsch and Scott Marlette. Marlette was one of the first 20 employees at Facebook and built Facebook's photo application. In 2005, Hirsch was the Vice President of Product at Facebook, working closely with Mark Zuckerberg. Bezdek and Hirsch served as co-chief executive officers until April 2023, when they stepped down from those roles and technology executive Scott Wagner was appointed interim chief executive officer. Bezdek became chair of the board, while Hirsch took on the role of chief mission officer. In December 2024, GoodRx announced that healthcare executive Wendy Barnes would become president and chief executive officer effective January 1, 2025. As of 2025, Barnes serves as the company’s CEO, while Trevor Bezdek and Scott Wagner serve as co-chairs of the board, and Doug Hirsch remains involved as a co-founder and senior executive. == Controversy == On February 25, 2020, Consumer Reports published an article stating that GoodRx shared user data—specifically, pseudonymized advertising ID numbers that companies use to track the behavior of web users across websites, the names of the drugs that users browsed, and the pharmacies where users sought to fill prescriptions—with Google, Facebook, and around twenty other Internet-based companies. A few days later, GoodRx released a statement saying that it had made changes to prevent user search data on medical conditions and pharmaceuticals from being shared with Facebook. In March 2020, GoodRx stopped sending data about user prescriptions to Facebook. On February 1, 2023, the Federal Trade Commission fined GoodRx US$1.5 million for violations of the Breach Notification Rule and the Federal Trade Commission Act for allegedly failing to obtain specific, informed, and unambiguous consent from users before disclosing health-related information to Facebook and Google. In November 2024, independent pharmacies filed at least three class action lawsuits against GoodRx and major pharmacy benefit managers. The cases, brought by independent pharmacies in California, Michigan, Pennsylvania, and Rhode Island, allege that GoodRx and the PBMs collaborated to suppress reimbursements for generic prescription drugs. They allege that agreements using GoodRx’s software suppressed reimbursements for generic drugs and violated the Sherman Antitrust Act. The suits claim the practices amount to price fixing which harms small pharmacies while benefiting PBMs and their affiliates. GoodRx settled both the 2023 FTC action and the 2025 class action lawsuit without admitting wrongdoing.

Key Transparency

Key Transparency allows communicating parties to verify public keys used in end-to-end encryption. In many end-to-end encryption services, to initiate communication a user will reach out to a central server and request the public keys of the user with which they wish to communicate. If the central server is malicious or becomes compromised, a man-in-the-middle attack can be launched through the issuance of incorrect public keys. The communications can then be intercepted and manipulated. Additionally, legal pressure could be applied by surveillance agencies to manipulate public keys and read messages. With Key Transparency, public keys are posted to a public log that can be universally audited. Communicating parties can verify public keys used are accurate.